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INTRODUCTION 



The ERIC network of 19 decentralized information centers 
was created to serve the educational community. The creators 
of the system recognized that information is one of our nation’s 
most valuable resources. Presently, knowledge is being created 
at a rate faster than it can be assimilated. There are more than 
three hundred education and education-related journals available 
today. For a teacher or researcher, numerous sources of infor- 
mation compound an already serious knowledge assimilation 

problem. 

One of the major goals of the ERIC Clearinghouse on Read- 
ing is to prepare useful information products that organize, ana- 
lyze, and synthesize information for the busy teacher, adminis- 
trator, and researcher. One approach used in preparing such 
highly useful, interpretive papers for educators is to ask scholars 
in the field to prepare a manuscript on a specific research topic. 
Reading: what can be measured? is such an analytical paper. 

This state-of-the-art monograph is an important contribution 
to the field of reading. The author not only explores measure- 
ment problems in reading but also raises some critical issues 
concerning the reading process and instructional practices in 
reading. Basic to further progress in the field of reading is an 
understanding of the reading process and the relationship be- 
tween instructional practices and instructional objectives. 
While Roger Farr does not supply the answer or answers to un- 
derstanding the reading process or resol ving the dilemma of re- 
lating instructional practices to instructional objectives, he does 
clarify measurement issues related to both these problem areas. 
In addition, he delineates some new directions for research in 
measurement and evaluation in reading. One of the major vir- 
tues of the Farr monograph is that the author provides guide- 
lines for the application of research to classroom practice. The 
Clearinghouse on Reading is proud to have sponsored this out- 
standing monograph and is pleased to publish it in cooperation 
with the International Reading Association. 

James L. Laffey 



Measurement in reading: general perspectives 



This monograph organizes and describes the research litera- 
ture on measurement and evaluation in reading. The review of 
the research is by no means exhaustive and while the major 
controversies in the field have been outlined, no attempt has 
been made to resolve them (although, in some instances, direc- 
tions for possible solutions have been offered). The mono- 
graph is intended to serve as a guide to the researcher in point- 
ing out both what is known and what is not known in measure- 
ment and evaluation in reading as well as to delineate those 
areas which need further research. The monograph also pro- 
vides guidelines for the classroom application of research and 
explains how the teacher can and should use the wide array of 
measuring devices available. A guide to tests and measuring 
devices in reading has been included as a companion piece to 
the monograph. In it are listed reading tests currently in print. 
Information about the grade levels at which the test is appro- 
priate, the kinds of sub-tests included within the test, the num- 
ber of forms the test has, and the amount of time needed for ad- 
ministration are included. In addition, the Guide makes it possi- 
ble for the teacher or researcher to obtain further information 
about any particular test either by writing to publishers (whose 
addresses appear in the Guide), by checking the reviews in 
Buros’ (1968) Reading Tests and Reviews, or by consulting re- 
search which has used these tests, easily available through the 
published journal literature which is described in documents 
from the ERIC/CRIER system. 

1 
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Reading: what can be measured? 



The major theme of the monograph is the use of tests in 
providing information about students’ reading achievement. 
Such information is necessary the teacher in setting instruc- 
tional goals and in helping students to develop their reading 
skills Thus, the first step in any discussion of testing and eval- 
uation in reading is to define those skills which are essential to 
the reading act. Once this is done, then it is P° ssl e o con 
sider whether reading tests accurately assess reading behavior. 
Can what they measure serve as a basis for organizing c ass 

room instruction? . . 

Before proceeding, something should be said about the limi- 
tations inherent in this monograph. Research on many aspects 
of measurement in reading is at best sparse. Even in those areas 
which have received a great deal of attention, more questions 
remain unanswered than answered. This monograp canno 
expected to definitively resolve all those questions posed by re- 
search, nor can it pretend to provide pat solutions to those 
problems with which the teacher is faced in t e c _°. ’ 
What it can do is summarize present research to ena 
practitioner to gain some insights into the problem of measuring 
reading behavior. Hopefully the monograph will prov'de a 
foundation for further research which will begin to provide 
more conclusive evidence on the nature of the skills underlying 
reading ability, the validity of present devices for measuring 
these skills, and the most effective means for using those devices 
which are currently available. 



Skills underlying reading ability 



In order to measure any behavior it is necessary to know 
what the basic components of that behavior are. Research has 
been far from conclusive in defining reading. Muc 
taken the form of factor-analysis studies in which various in 
of tests (e.g„ tests of reading ability as well as tests of language 
usage and general intelligence) are administered to a group of 
students and the test results are then analyzed to determine 
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basic components of the reading act. In a review of twelve 
such research studies (Traxler, 1941; Gans, 1940; Davis, 1941; 
Thurstone, 1946; Langsam, 1941; Conant, 1942 Artley, 1944; 
Hall & Robinson, 1945; Harris, 1948; Maney, 1952; Sochor, 
1952; Hunt, 1952; Stoker & Kropp, 1960), Lennon (1962) 
purported to find agreement on four factors basic to reading 
which could be measured. The four factors were: 1] a general 
verbal factor, 2] comprehension of explicitly stated material, 3] 
comprehension of implicit or latent meaning, and 4] apprecia- 
tion. From a brief glance at Table 1 in which all twelve studies, 
including the test instruments used in each, and the factors each 
isolated are presented, it becomes obvious that Lennon’s inter- 
pretation of the studies was perhaps an over-simplification. 
The studies showed only limited agreement as to the number of 
factors: some named only one factor (Conant, 1942, for in- 
stance) while others (such as Davis, 1941) found six. That 
there should be such disparity is not surprising: factor-analysis 
studies are dependent on both the data collected and the man- 
ner in which it is collected. The same tests were not used in 
each study and those which were used measured a wide array of 
elements ranging from personality factors, social studies and sci- 
ence achievement, and intelligence to reading, as defined by as 
many publishers and researchers as tests that were used. Given 
this situation, it is hardly surprising that the factors thought to 
comprise reading lack consistency from study to study. 

One of the more extensive attempts to define reading 
through factor analysis was carried out in a series of studies by 
Holmes (1962) and Holmes and Singer (1964, 1966) in which, 
after administering a battery of reading tests, the matrix of all 
possible correlations were analyzed. A variety of sub-factors 
were isolated which Holmes and Singer believed accounted for 
the variance in students’ speed of reading and power of reading 
and were, therefore, central to reading ability in general. The 
kinds of factors which they found are listed in Table 2. These 
particular factors appeared with fifth-grade students. 

Holmes and Singer’s research, subsequently labelled the 



Langsam (1941) Verbal (word meaning) 6 reading tests (14 sub-tests) 

Perceptual 1 intelligence test (7 sub-tests) 

Word (fluency) 

Seeing relationships 
Numerical 
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Reading: what can be measured? 



substrata-factor theory of reading, has been criticized on both 
theoretical and technical levels (Sparks & Mitzel, 1966). The 
most serious question raised about the theory was put forth by 
Raygor (1966) and it applies to all such factor-analysis studies. 
How valid and reliable are the tests used to gather the data? 
Obviously, the validity and reliability of a given test determines 
the validity and reliability of any conclusions derived from its 
use. Another criticism leveled at the substrata-factor theory is 
that no comprehensive explanation of the skills needed for read- 
ing can be based solely on the results of reading tests. Factors 
such as personality variables, socio-economic background, and 
psycholinguistic experience have to be included. In fact, Good- 
man (1969) has gone as far as to suggest that Holmes and 
Singer have not developed a theory at all; instead, Goodman 
claimed, they have merely manipulated statistics generated by a 
set of reading tests. 

That attempts have been made to define reading by examin- 
ing performance on reading tests is not surprising since, on the 
surface, it appears a logical procedure. However, such at- 
tempts are severely limited. Performance on any one reading 
test is only a sample of an individual’s behavior in one given sit- 
uation under a single set of conditions. Significant differences 
in performance can occur when the time of day of test adminis- 
tration, the content of the reading material on the test, or the 
examiner administering the test are varied. 

Several researchers have attempted recently to define read- 
ing in psycholinguistic terms. Goodman (1969) has developed 
a theory of reading which accounts for the nature of language 
and the reader’s psycholinguistic background. According to 
Goodman, reading is a form of information processing: it occurs 
when an individual selects and chooses from the information 
available to him in an attempt to decode graphic messages. 
Thus, Goodman suggested that perhaps the reading process can- 
not be fragmented. Ryan and Semmel’s (1969) review of re- 
cent psycholinguistic theories of reading substantiated Good- 
man’s point of view. They concluded: 
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Research has demonstrated that the reader does not pro- 
cess print sequentially, but rather in a manner which re- 
flects his use of language at every opportunity. 
Expectancies about syntax and semantics within contexts 
lead to hypotheses which can be confirmed (or discon- 
firmed) with only a small portion of the cues available in 
the text. Thus, not all the information needed by the 
reader is on the printed page — nor are all the printed de- 
tails needed by him. (1969, p. 82) 

If one were to extrapolate components of reading behavior from 
these psycholinguistic theories, they would probably include the 
ability to use knowledge of written syntax, knowledge of words 
used in context, and knowledge of how to use phonological 

cues. 

Perhaps the psycholinguistic approach will provide a more 
viable definition of reading and lead to a more solid basis for 
test construction. It may well be that research will find, as the 
proponents of psycholinguistic theory have suggested, that at- 
tempts to define reading sub-skills on a group basis are fruitless. 
In that case, measurement in reading would have to be based on 
whether a reader has a strategy for decoding written messages 
and whether he understands reading as a communication pro- 
cess rather than whether he can simply decode written symbols, 
supply the meanings of words in isolation, or answer multiple- 
choice questions based on a literal understanding of a selection. 
Until research is carried out to develop tests which take into ac- 
count the elements psycholinguistic theorists are finding central 
to reading ability, the teacher will still need to use present sub- 
tests of reading to evaluate reading ability, but this use of sub- 
tests must be done cautiously. Present reading tests can be 
helpful if the sub-tests are recognized merely as measuring the 
readers’ different ways of interacting with printed messages and 
together are taken to represent a measure of the students ability 

to utilize text material effectively. 

It should be obvious by now that research has provided no 
clear-cut theoretical definition of reading and it is likely that this 
will be the case for some time to come. Yet, the classroom 
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teacher needs some kind of operational definition or at least 
some idea of what is involved in reading in order to proceed 
with instruction. While research is far from unanimous that 
any one skill or combination of skills are, in fact, central to 
reading, there is general agreement that some skills are related 
to reading, even if that relation is questionable. The kind of 
skills which a teacher should be able to assess would center 
around the learner’s ability to decode written symbols, the ex- 
tent of his sight vocabulary, his knowledge of word attack skills, 
and his fluency in oral reading. Beyond this level of decoding, 
the teacher should have some idea of the learner’s comprehen- 
sion abilities, his ability to determine the pronunciation and 
meaning of words, his ability to read for the main idea, and his 
understanding of the author’s intent. One skill area which is in- 
dicative of a mature reader and which is often overlooked by 
the teacher is the learner’s ability to set his own goals and pur- 
poses for reading. Sub-skills here would involve the extent to 
which independent reading habits have been established, 
whether what is read can be applied to the solution of practical 
problems, and whether newly acquired information can be inte- 
grated with that obtained through previously read materials. 
Most of the skills which have just been mentioned can be meas- 
ured by either formal or informal tests. Doubtless, there are 
other skills which were not included here, which are capable of 
measurement. For the purposes of this discussion, however, it 
suffices to point out that while there is no consensus as to what 
reading is, the teacher can still use tests which are based on 
seemingly inadequate theoretical foundations. Later chapters 
of this monograph deal in detail with the kinds of testing instru- 
ments that can provide the information described above and 
how these instruments might be used by the classroom teacher. 



Variables affecting reading performance: 
the student’s background 

Measurement and evaluation in reading programs usually 
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are concerned with determining how well a student reads. How 
well the student reads is influenced, to some extent or another, 
by his experiential background which he brings to the classroom 
and over which the classroom teacher has only partial control. 
Factors such as sex, socio-economic background, and personal- 
ity do exert some influence. The problems that these present to 
the test user, however, are a matter of the degree of influence 
they exert on test performance. Do they so distort the per- 
formance that an accurate assessment of students’ skills be- 
comes impossible? A review of all the studies of the effect of 
student background on test performance is beyond the scope of 
this monograph. The studies included here emphasize that test 
performance cannot be the only means of assessing student ca- 
pacity since it represents only a single sample of an individual’s 
behavior which is affected by many immediate and long-term 
factors. 

Sex differences 

Of all the factors influencing test performance, sex differ- 
ences have received the greatest amount of research attention. 
Their importance has been shown to vary at different age levels 
and to depend on a number of influences. Traxler and Spauld- 
ing (1954) compared the performance of 200 boys and 200 
girls in private New York City area schools. Girls in grades 
three, five, and seven performed consistently higher than the 
boys in spelling and language, but the two groups were about 
equal in word meaning and paragraph meaning as measured by 
the Stanford Achievement Test. Traxler and Spaulding (1954, 
p. 80) suggested that separate sex norms should not be pro- 
vided in the Stanford test because of the extensive overlap in 
achievement at various grade levels and because of the “similar- 
ity of the educational goals of boys and girls in independent ele- 
mentary schools.” However, caution should be exercised in in- 
terpreting these results because of the absence of statistical 
analysis in the study. In a more carefully controlled study, 
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Hughes (1953) found that girls read significantly better than 
boys in grades three and four, but that these differences were 
not sustained beyond the fourth grade. American culture may 
well be the element in promoting differences by sex in reading 
performance. In Preston’s (1962) study of American and Ger- 
man students, the superiority of girls over boys in the case of 
American children was reversed with German children. This 
was attributed to German cultural influences such as the pre- 
dominantly masculine teaching body in Germany. 

Studies of sex differences in reading test performance are 
generally quite consistent regarding American children: girls do 
perform better than boys, especially during the first years of 
school. However, few reading tests have taken this into ac- 
count. Only a handful like the Gray Oral Reading Test provide 
separate norms for each sex. Traxler and Spaulding (1954) 
examined one hundred reading tests randomly selected from the 
files of the Educational Records Bureau. Only six tests pro- 
vided separate sex norms; in addition, in the manuals of the one 
hundred tests surveyed only seven made any reference to the 
existence of sex differences. 

While sex is a statistically significant variable affecting test 
performance, is it important in instruction? If understanding 
the existence of sex differences leads to a careful examination of 
the cause and subsequent adjustments in reading instruction, it 
is an important finding. However, because it is perhaps im- 
practical to provide separate reading programs for boys and 
girls, any suggestion that separate test norms should be provided 
for boys and girls is probably not a valid one. In addition, most 
standardized tests are designed for comparing groups of children 
without regard to sex differences and the norms provided by the 
better test publishers carefully control variables such as sex, 
usually by random sampling procedures. 

Socio-economic status 

The influence of socio-economic status on test performance 
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has become an extremely controversial issue. In the famous 
Hobson vs. Hansen case in the District of Columbia, a group of 
parents charged the school district with unconstitutionally de- 
priving disadvantaged Negro pupils of equal access to educa- 
tional opportunities. Included among the charges of discrimi- 
natory practices was the selection of biased tests for the place- 
ment and evaluation of pupils within the school (Lennon, 
1968). 

Reading test performance and socio-economic status have 
been shown to be highly related at all levels from the first grade 
through college. In a study of the relationship between socio- 
economic status and a number of variables including reading 
comprehension and vocabulary achievement, Hill and Giam- 
matteo (1963) found that in a population of third-grade chil- 
dren, the high socio-economic group was eight months ahead of 
the low socio-economic group in vocabulary achievement. In 
reading comprehension, the range between the groups was 
equivalent to a full school term or nine months. Carson and 
Rabin (1960) investigated the verbal comprehension and ver- 
bal expression in Negro and white children. While this study 
did not use a reading achievement test as one of the variables, 
the importance of socio-economic class on test performance is 
worth noting. Three groups were studied: southern Negroes, 
northern Negroes, and northern whites. Subjects were matched 
for age, grade placement, sex, and level of comprehension; all 
the subjects were in the fourth, fifth, or sixth grades. Carson 
and Rabin found that white children scored higher than Negro 
children and that northern Negro children scored higher than 
southern Negro children on tests of verbal comprehension and 
communication. 

The importance of the high correlation between socio- 
economic status and reading test performance is not in under- 
standing that these differences exist, but rather in understand- 
ing what can be done to correct them. A first step in this direc- 
tion was undertaken by Boykin (1955). Boykin studied the 
reading performance of Negro college students to assess in 





14 



Reading: what can be measured? 



greater detail their reading problems, needs, and capabilities. 
Boykin’s subjects scored only two-thirds that of the norming 
population on the Cooperative English Test: Reading Com- 
prehension. The group also achieved lowest on vocabulary 
and highest on level of comprehension, while the norming 
population for the Cooperative test had scored highest on 
speed of comprehension. Further examination of Boykin s 
data indicates that the differences between the norming popu- 
lation and Boykin’s subjects was about three-fourths of a 
standard deviation on level of comprehension; on speed of 
comprehension and vocabulary, the difference was about one 
and a half standard deviations. On all three sub-tests, the 
norming population scored significantly higher. Boykin’s con- 
clusion was not that the Cooperative test was inappropriate for 
Negro students, but that further research should be carried out 
to determine why the Negro students scored so poorly on it. 
Such studies, Boykin argued, should be focused on planning 
programs for improving the reading skills of Negro college 
students. 

Socio-economic status and reading disability have also been 
shown to be highly related in studies with other disadvantaged 
groups (Chandler, 1966; Anastasi & D’Angelo, 1952; Kline- 
berg, 1947). These studies are valuable not only because of 
the effect they have on the testing process such as the search for 
culture-free or culture-fair tests, but rather because of the con- 
tribution they can make to increasing educational opportunities 
through better teaching and school programs. After all, the 
fault does not lie with the tests or with the student; it lies with 
society and the educational system which produced the test per- 
formance. 

Personality variables 

Personality variables also seemingly affect student perform- 
ance. A student’s attitude toward a test, his concept of his own 
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ability to perform on it, his physical well-being, and the attitude 
of his parents and siblings may well influence his performance. 
Sheldon and Carrillo’s (1952) study of this problem compared 
students’ reading performance on the Progressive Reading Test 
(now the California Achievement Tests) to home background 
information gathered through a questionnaire sent to the stu- 
dents’ parents. A summary of their results indicated that stu- 
dent attitudes toward education strongly influenced their reading 
test performance and that these attitudes appeared to be shaped 
by parental attitudes. 

In another study Edwards (1962) tried to assess students’ 
attitudes toward reading by administering a concept test in 
which the students were asked to choose phrases characteristic 
of good readers. In a pilot study of six students reading six 
months above mental grade placement and six students reading 
six months below mental grade placement, a positive relation- 
ship was found between acquired concept of reading and the 
score on a reading achievement test. Further experimentation 
with a larger sample size did not reveal any significant correla- 
tion. Studies like Edwards’ should always be interpreted cau- 
tiously because of the reliance on the correlation coefficient. In 
such studies, it is not always possible to determine which factor 
is the cause, which is the result, or whether some third factor is 
affecting both variables. 

Other variables influencing test results 

The choice of a particular test has also been shown to exert 
a great influence on students’ reading grade scores (Ware, 
1956). The demographic characteristics of the population used 
in norming a test, the reading difficulty level of the test used, 
and the relationship of a test to the specific objectives of the in- 
structional program can all influence grade scores on that test. 
The effect of using an inappropriate test with a particular stu- 
dent is perhaps the most serious of these problems. If a test 
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does not include enough lower limit for a poor reader or enough 
upper limit for a good reader, an inaccurate estimate of reading 
ability results. 



Variables affecting reading performance: 
the reading program 

While evaluation and measurement in reading have focused 
primarily on students’ performance, there are variables within 
the reading program itself which influence that performance and 
which can be measured. Such variables include teaching proce- 
dures, the training and personality of the teacher, instructional 
materials, the physical setting for the reading program, and cur- 
riculum organization. All the studies dealing with these varia- 
bles are not reviewed here. Only two of them — the difficulty 
level of materials and teaching procedures and teacher knowl- 
edge of those procedures — are discussed in any detail. It is 
hoped that this brief overview will emphasize the need for re- 
search on this aspect of the reading program. 

Difficulty level of materials 

The difficulty level of reading materials has probably re- 
ceived the most research of any instructional element within the 
reading program. The vast majority of studies on readability 
have tried to define the relationships between number of words 
and syllables to the difficulty of the selection. Chall’s (1958) 
monograph, Readability: An Appraisal of Research and Appli- 
cation, is a comprehensive review of readability research. In it, 
Chall organized the research under three main categories: 
quantitative associational studies, surveys of expert and reader 
opinion, and experimental studies of one factor. The most 
commonly-reported type of study was the quantitative associa^ 
tional one, in which the outcome was “the readability formula 
based on the counting and weighing of several significant factors 
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in the printed material to predict the reading skill necessary to 

understand it” (Chall, 1958, p. 155). 

Early studies of readability emphasized vocabulary difficulty 
and average sentence length, both usually determined by count- 
ing words or syllables. More recent studies have attempted to 
assess more complex grammatical aspects of written prose. 
Bormuth (1965a, 1965b, 1966) has used the cloze procedure 
in several studies to investigate some of the underlying gram- 
matical factors which are related to the reading difficulty of text 
material. Bormuth (1967) computed the correlation between 
comprehension of independent clauses and the frequency of in- 
dependent clauses, mean word depth, and length (measured in 
letters). He concluded that all three factors had a significant 
correlation with comprehension, but that the frequency variable 
was too small to be of value in predicting readability. 

A number of researchers have attempted to validate read- 
ability formulas by comparing readability scores with reading 
comprehension. A study of the comprehension of newspaper 
articles which were written at both easy and difficult levels ac- 
cording to the Flesch and Dale-Chall formulas was conducted 
with a group of adult employees of a midwestern company 
(Swanson & Fox, 1953). Differences in comprehension be- 
tween the two versions were significant; however, the easier ver- 
sion did not attract more readers than the difficult version. 
Swanson and Fox pointed out that factors such as motivation 
and interest are at least as important as sentence length and vo- 
cabulary difficulty in attracting readers and in determining re- 
tention of information. 

Several researchers who have validated and correlated read- 
ability formulas have suggested that while the formulas can 
provide an indication of relative difficulty of material, more ex- 
tensive studies are needed to determine the effect of a broader 
range of factors. Russell and Fea (1951) in such an investiga- 
tion of the Dale-Chall, Flesch, Lewerenz, Lorge, Washburne- 
Morphett, and Yoakam readability formulas stressed that the 
formulas do not : 
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1] give any measure of conceptual difficulty in the tex- 
tual material, 

2] take into consideration the way the material is organ- 
ized or arranged, 

3] allow for variations in the meaning of multiple mean- 
ing words, 

4] accept the fact that a fresh or unusual word may 
make a sentence or idea clearer than a commonplace 
word, 

5] vary their ratings in terms of different interests which 
persons may have at different developmental levels or 
in individual activities, 

6] provide measures of difficulty below the fourth-grade 
level, and 

7] take account of physical factors such as format and il- 
lustrations. 

Because of the above factors, publishers do not generally seem 
to pay much attention to readability formulas. Mills and Rich- 
ardson (1963) sent out questionnaires to twelve well-known 
publishers of children’s books, asking if they used readability 
formulas in text preparation. Despite a great deal of follow-up 
effort, only seven questionnaires were returned. In half of 
these, the publishers responded negatively. Two of the publish- 
ers were quite disturbed at the suggestion that such formulas 
should be used : one stated that the wide range of reading abili- 
ties at a single grade in various parts of the country rendered 
readability formulas very unreliable; the other indicated that ac- 
tual readability is probably not a function of mechanical factors, 
but rather is derived from motivational factors. A number of 
studies have been designed to demonstrate the effect of the 
readability of a test on student performance. Levy (1958) ad- 
ministered a revised form of the Study of Values to three groups 
with varying reading abilities. For the poorer readers, the two 
forms of the test were found not to be equivalent. Thus, Levy 
emphasized the importance of reading ability in all pencil-and- 
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paper personality tests. Johnson and Bond (1950) after study- 
ing the reading difficulty of ten standardized group tests of per- 
sonality and intelligence concluded that many of the tests would 
be too difficult for the less able readers to comprehend and 
would, therefore, not be valid measures of the trait being stud- 

ied. 

Research on the reading difficulty of materials has focused 
on a very narrow range of factors. While the most recent stud- 
ies have employed a broader spectrum of grammatical elements, 
they have still neglected personality, motivation, and interest 
variables. It is clear from an overview of classroom practice 
that the results of the many studies on the readability of stand- 
ardized tests have not had widespread application: test con- 
sumers more often than not have failed to take into account the 
reading difficulty of individual test items in assessing various 
personality traits and student abilities. 

The concentration of research on the difficulty level of mate- 
rials has overshadowed the importance of research into the ef 
fectiveness of those materials. Probably the reason for this has 
been the lack of consensus as to what the criteria should be for 
evaluating effectiveness. Recently, Goodman, Olsen, Calvin, 
and Vanderlinde (1967) have developed criteria for such eval- 
uation. Their criteria include psychological, socio-cultural, ed- 
ucational, linguistic, and literary principles. What is badly 
needed now are studies which will employ these criteria and 
focus on making materials both more readable as well as moie 
effective. 

Teacher effectiveness Research on teacher effectiveness has been 
based almost exclusively on student performance on standaid- 
ized tests. While this is certainly an acceptable criterion for 
evaluating the effectiveness of instruction and does reflect the 
effect of the teacher on student performance, there are many 
other ways to evaluate teaching that do not rely solely on stu- 
dents’ test performance. In fact, there is a vital need to analyze 
teacher behavior in and of itself; it might well lead to a bettei 
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and clearer conception of those factors which contribute to im- 
proved student performance. 

The inconclusiveness of the U.S. Office of Education first- 
grade studies (Bond & Dykstra, 1967) cogently demonstrated 
the importance of having a clear delineation of experimental 
procedures being compared and how much work has to be put 
into doing this. In the first-grade studies, it was virtually im- 
possible to know what was actually being compared within each 
of the 27 projects since many of the techniques purported to be 
different were, in fact, quite similar. 

Prior to any evaluation of teacher behavior, a specific de- 
scription of what constitutes good teaching is necessary. 
Hughes and his colleagues (1959) have identified a number of 
behaviors which could serve as standards for assessing effective 
instruction. Hughes indicated that if a child is to develop ade- 
quate communication skills, he must have opportunity to talk 
and listen to others. Therefore, the teachers* responses to a 
student should include the following: 

1] seeking the student’s opinion and experience; 

2] giving the student an opportunity to use a variety of 
media of communication; 

3] giving him a model of standard language usage; 

4] providing him with a variety of books and other read- 
ing materials; 

5] seeking to further his purposes in reading; 

6] giving him opportunity to compare his reading with 
his new experience and to draw inferences and gener- 
alizations from his reading; 

7] seeking the child’s own idiomatic response in writing 
and other media of expression. 

Other researchers, such as Sears (1963), Wallen and Wodtke 
(1963), and Spaulding (1963), have attempted to define 
“good” teaching through an inductive approach. They have in- 
vestigated those teacher behaviors which appear to have the 
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most positive influence on student behaviors. They also have 
developed elaborate lists of the behaviors characteristic of good 
teaching which include the teacher’s willingness and ability to 
alter his behaviors to meet varying situations, to understand the 
students’ point of view, to try new procedures, to ask effective 
questions, to use positive reinforcement of student behaviors, 
and to continue learning in a wide variety of subject areas. 

Teachers’ knowledge of specific skill areas involved in read- 
ing instruction has been explored by several researchers (Shel- 
don, 1960; Ramsey, 1962; Schubert, 1959). While their work 
as a whole has provided valuable information about the extent 
and limitations of teachers’ knowledge, the information has not 
been applied to outcomes in the teaching of reading. In other 
words, research has failed to relate how a lack of knowledge of 
phonics, for instance, would influence teacher behavior and ef- 
fectiveness. In one study of this type, Spache and Baggett 
(1965) administered a phonics knowledge test to a group of 99 
graduate students enrolled in a graduate reading course. 
Ninety-three class members were in-service teachers pursuing 
graduate credit in reading. A very serious deficiency in teacher 
knowledge of phonic and syllabication rules was found. 
However, the investigators failed to provide evidence which 
would indicate the importance of this kind of knowledge to in- 
struction. The implied assumption seemed to be that the ability 
to perform at a high level on a phonics principles and syllabica- 
tion test is a very important element in the successful teaching 
of reading. Despite the fact that this is a logical assumption, 
failure to provide hard evidence does seriously limit any infer- 
ences that can be drawn from the study. 

A variety of evaluation techniques need to be developed to 
enable the teacher to make an adequate assessment of his own 
instruction. Goodson (1965) developed such a multiple ap- 
proach by analyzing the literature in reading and identifying 
areas which were essential to the competent reader: sight vo- 
cabulary, word attack, word meaning, mechanics of oral and/or 
silent reading, taste and enjoyment in reading, study skills, criti- 
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cal comprehension, and literal and interpretive comprehension. 
Goodson used these areas to develop a classroom observation 
guide, a questionnaire of instructional problems, and an inven- 
tory of teacher beliefs concerning the teaching of reading. He 
then tested these instruments out on nine educators and con- 
ducted studies of reading programs in 14 teachers’ classes. 
While the conclusions of this study were not based on statistical 
analyses, Goodson found that the teachers and supervisory per- 
sonnel who used the instruments found them to be helpful in 
aiding them to improve their reading instruction. 

To determine the proficiency of elementary teachers in using 
a wide variety of information to improve reading instruction, 
Burnett (1961) constructed a problem-solving test. Significant 
differences in performance on the problem-solving test were 
found between reading specialists, undergraduate elementary 
education students, and experienced teachers. The reading 
specialist scored highest, the experienced teachers scored next 
highest, and the undergraduate elementary education majors 
scored lowest. The results of this study were limited because 
Burnett did not relate the relevance of this problem-solving abil- 
ity to actual classroom teaching. 

Some researchers, however, have discussed teacher knowl- 
edge in the context of student performance in the classroom. 
To measure teacher skill, Wade (1960) constructed a test con- 
sisting of ten problems. Those who scored highest were consid- 
ered skillful in selecting books of the proper difficulty level, in 
placing children into homogeneous reading groups, in judging 
the amount of reading gains that pupils achieved after classroom 
instruction, in observing specific reading skill deficiencies, in 
diagnosing and correcting phonic and syllabication errors, in or- 
ganizing a child’s own word perception errors into meaningful 
instructional categories, and in recognizing the goals of various 
kinds of reading workbook exercises. The statistical analysis 
revealed that teacher performance on Wade’s test was signifi- 
cantly related to student performance on a standardized reading 
test. 
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As more research begins to specify and define specific 
teacher behaviors and to relate these behaviors to accepted cri- 
teria for what constitutes a good reading program, adequate 
evaluation of teaching and teaching procedures will become 
possible. Another method of attack in evaluating teaching 
method and teacher effectiveness might be to ask teachers them- 
selves to identify those areas in which they believe themselves to 
need additional training. If the sheer number of investigations 
carried out along these lines is any indication, this is a very 
popular technique for evaluating teacher skills. Typical of 
these questionnaire studies is one conducted by Hester (1953) 
in which teachers were asked to list the most serious problems 
they faced in reading instruction and indicate those problems 
with which they needed the most help. One interesting finding 
was that teachers wanted this help within their regular class- 
room situation. A second major result was that teachers seem 
to be relatively unconcerned with the teaching of reading in the 
content areas. Questionnaire studies were deemed valuable by 
Hester for determining those areas of teacher weakness which 
stand in need of remediation. Other research using question- 
naires to determine teacher skills and needs were carried out by 
Aaron (1960) and Purcell (1958). 

While questionnaire studies may point out some teacher- 
perceived weaknesses, many teachers are unable to identify 
those areas crucial to the teaching of reading in which they lack 
knowledge. In short, it appears that many teachers do not 
know what they do not know. One study which underlined this 
was initiated by Wilt (1950). Teacher awareness of listening 
as a factor in elementary education was compared with the ac- 
tual amount of time spent on listening in a classroom. Wilt ad- 
ministered a questionnaire to teachers to determine the percent- 
age of the school day that they expected children to listen and 
the relative importance they placed upon listening as compared 
to other facets of language art instruction. To verify the 
teacher answers, classroom observations were conducted. The 
most significant finding on the questionnaire was that teachers 
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expected children to spend more time learning through reading 
than through any other language skill. Observations of teacher 
practices did not bear out the results of the questionnaire sur- 
vey. In actual classroom activity, 57.5 per cent of the time was 
spent by the children in listening. When speaking and writing 
time were further subtracted from the remaining 42.5 per cent, 
it was obvious that the teachers were quite inaccurate in the 
amount of time that they expected the children to be learning 
through reading. Wilt’s findings should certainly cause those 
who are evaluating teacher behaviors through questionnaire 
techniques to be cautious in interpreting results. 

Austin and Morrison (1963) undertook a comprehensive 
nationwide study of the reading instruction in elementary 
schools. Questionnaires were used as well as classroom obser- 
vations. One facet of the investigation was to compare prob- 
lems in the teaching of reading reported by supervisory person- 
nel and those reported by classroom teachers. Problems identi- 
fied by supervisory personnel included: 

1 ] providing for individual differences 

2] teaching reading skills appropriate for the interme- 
diate grades 

3] teaching reading skills in the content areas 

4] appropriate utilization of available materials 

5] pacing 

6] organizing children into flexible groups 

7] creative teaching 

8] understanding broad aspects of the reading program 

9] understanding phonic principles 

10] teaching children how to identify unfamiliar words 

Teacher perceptions differed somewhat from those of the su- 
pervisory personnel. They believed the most frequent weak- 
nesses of their reading programs to be: 

1] the paucity and kind of materials available 
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2] the lack of motivation provided by the content of 
reading books 

3] the lack of phonic practices in workbooks 

4] the lack of aid in providing for homogeneous groups 
with large classes of children 

5] a lack of sufficient time to teach basic reading skills 

6] lack of necessary guidance from administrators 

Studies such as the one by Austin and Morrison can provide 
reliable insights into the evaluation of the teaching of reading 
when they are both based on questionnaires and careful class- 
room observation. The problems in evaluating teaching are 
perhaps best exemplified by an analysis of the research con- 
ducted by Anderson and Hunka (1963). They concluded that 
research in teacher evaluation has been unproductive and has 
reached a dead end because of problems encountered in devel- 
oping suitable criteria variables. They were alarmed not only 
because of the lack of these criteria variables, but also by the 
absence of reliable measurement for those variables which have 
been identified. 

Another general problem in assessing teaching is the inabil- 
ity of research to isolate and define behaviors. The problems of 
conducting methods studies in teaching reading are always com- 
pounded by the lack of control over and description of teacher 
behaviors. It seems that research in this area must begin to 
focus on a broader spectrum of variables than merely teacher 
knowledge. Methods need to be developed for analyzing other 
aspects of teacher behavior including such facets as teacher mo- 
tivation and personality. Most of all, future research should re- 
veal how these variables promote effective teaching. 



In conclusion 

The preceding discussion is by no means definitive in terms 
of the kinds of variables which can affect measurement, but 
hopefully the reader will bear these in mind as he goes through 
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the remainder of the monograph. The major concern of the 
present monograph is not on what variables influence student 
performance, rather it focuses on the kinds of measurement de- 
vices research has made available, what these devices measure, 
how they can be used, and their validity and reliability. The 
specific areas covered are organized around the measurement of 
specific skills related to reading, the types of testing procedures 
that "exist for measuring these abilities, the evaluation of reading 
growth, and, last, but not least, measures of reading-related 
functions. 

It is important, throughout the following pages, to bear in 
mind the fact that the art and science of measurement and eval- 
uation is inexorably intertwined with all phases of reading edu- 
cation. It is virtually impossible to review research in this area 
without touching on all phases of the psychology and teaching of 
reading. At the same time, most reviews of measurement pro- 
cedures in general cannot avoid discussing reading abilities. 
The broad study of measurement and evaluation presents a par- 
adox in education. Research knowledge far outstrips classroom 
practice. Part of the problem is caused by the development of 
a technical vocabulary by the researcher which is seldom under- 
stood by most teachers. Added to this has been the deification 
of tests and test scores on the part of the classroom teacher. 

Finally, the focus of this monograph is on presenting infor- 
mation and analyses useful to both the practitioner and re- 
searcher who want to keep abreast of the present state of 
knowledge in reading measurement. While academic and 
scholarly measurement problems are not avoided, they are 
deemphasized unless they form an integral part of the research 
under consideration. 








References 



Aaron, I. E. What teachers and prospective teachers know about phonic 
generalizations. Journal of Educational Research, 1960, 53, 323-30. 

Anastasi, Ann, & D’Angelo, Rita Y. A comparison of Negro and 
white preschool children in language development and Goodenough 
Draw-a-Man I.Q. Journal of Genetic Psychology, 1952, 31, 147-65. 

Anderson, C. C., & Hunka, S. M. Teacher evaluation: some problems 
and a proposal. Harvard Educational Review, 1963, 33, 74-96. 

Artley, A. S. A study of certain relationships existing between general 
comprehension and reading comprehension in a specific subject matter 
area. Journal of Educational Research, 1944, 37, 464-73. 

Austin, Mary C., & Morrison, C. The first R. New York: Macmillan 
Company, 1963. 

Bond, G. L., & Dykstra, R. The cooperative research program in first- 
grade reading instruction. Reading F.esearch Quarterly, 1967, 2 (4), 5- 
142. 

Bormuth, J. R. Validities of grammatical and semantic classifications of 
cloze test scores. In J. A. Figurel (Ed.), Reading and inquiry. Pro- 
ceedings of the International Reading Association, 1965, 10, 283-86. (a) 

Bormuth, J. R. Optimum sample size and cloze test length in readability 
measurement. Journal of Educational Measurement, 1965, 2, 111-16. 
(b) 

Bormuth, J. R. Readability: a new approach. Reading Research Quar- 
terly, 1966, 1 (3), 79-132. 

Bormuth, J. R. Comparable cloze and multiple-choice comprehension 
test scores. Journal of Reading, 1967, 10, 291-99. 

Boykin, Leander L. The reading performance of some Negro college 
students. Journal of Negro Education, 1955, 24, 435-41. 

Burnett, R. W. The diagnostic problem solving proficiency of elementary 
teachers in teaching reading. Unpublished doctoral,, dissertation, Indiana 
University, 1961. 

Buros, O. K. Reading tests and reviews. Highland Park, N. J.: 
Gryphon Press, 1968. 

Carson, A S., & Rabin, A. I. Verbal comprehension and communica- 
tion in Negro and white children. Journal of Educational Psychology , 
1960, 57, 47-51. 

Chall, Jeanne S. Readability: an appraisal of research and application. 
Ohio State University, Bureau of Educational Research Monographs, 
1958, 34. 







28 



Reading: what can be measured? 



Chandler, T. A. Reading disability and socio-economic status. Journal 
of Reading, 1966, 10, 5-21. 

Conant, Margaret M. The construction of a diagnostic reading test. 
N. Y.: Teachers Collge Press, Columbia University, 1942. 

Davis, F. B. Fundamental factors of comprehension in reading. Un- 
published doctoral dissertation, Harvard University, 1941. 

Edwards, D. L. The relation of concept of reading to intelligence and 
reading achievement of fifth grade children. Unpublished doctoral dis- 
sertation, University of Buffalo, 1962. 

Gans, Roma A. A study of critical reading comprehension in the inter- 
mediate grades. New York: Teachers College Press, Columbia Uni- 
versity, 1940. 

Goodman, K. S. Analysis of oral reading miscues: applied psycho- 
linguistics. Reading Research Quarterly, 1969, 5, 9-30. 

Goodman, K. S., Olsen, H. C., Calvin, Cynthia M., & Vanderlinde, L. 

Choosing materials to teach reading. Detroit: Wayne State University 
Press, 1967. 

Goodson, R. A. The development of three instruments to aid in the 
analysis of teacher practices, problems and theoretical beliefs concern- 
ing the teaching of reading in the later elementary grades. Unpub- 
lished doctoral dissertation, Columbia University, 1965. 

Hall, W. E., & Robinson, F. P. An analytic approach to the study of 
reading skills. Journal of Educational Psychology, 1945, 36, 429-42. 

Harris, C. W. Measurement of comprehension of literature: II. studies 
of measures of comprehension. School Review, 1948, 56, 332-42. 

Hester, Kathleen B. Classroom problems in the teaching of reading. 
Elementary School Journal, 1953, 54, 84-87. 

Hi’l, E. H., & Giammatteo, M. C. Socio-economic status and its re- 
lationship to school achievement in the elementary school. Elementary 
English, 1963, 40, 265-70. 

Holmes, J. A. Speed, comprehension, and power in reading. In E. P. 
Bliesmer & R. C. Staiger (Eds.), Problems, programs and projects in 
college-adult reading. Yearbook of the National Reading Conference, 
1962, 11, 6-14. 

Holmes, J. A., & Singer, H. Theoretical models and trends toward more 
basic research in reading. Review of Educational Research, 1964, 34, 
127-55. 







Measurement in reading: general perspectives 



29 



Holmes, J. A., & Singer, H. The substrata-factor theory: substrata fac- 
tor differences underlying reading ability in known groups at the high 
school level. (Final report covering contracts No. 538, SAE-8176 and 
538 A, SAE-8660) Washington, D. C.: U. S. Government Printing 
Office, 1966. 

Hughes, Mildred C. Sex differences in reading achievement in the ele- 
mentary grades. In Helen M. Robinson (Ed.), Clinical studies in read- 
ing. Supplementary Educational Monographs, 1953, 77, 102-06. 

Hughes, M. M., et al. Development of the means for the assessment of 
the quality of teaching in elementary schools. (Final report of Coopera- 
tive Research Project No. 353) Washington, D. C.: U. S. Department of 
Health, Education, and Welfare, 1959. 

Hunt, L. C. A further study of certain factors associated with reading 
comprehension. Unpublished doctoral dissertation, Syracuse University, 
1952. 

Johnson, R. H., & Bond, G. L. Reading ease of commonly used tests. 
Journal of Applied Psychology, 1950, 34, 319-24. 

Klineberg, D. Negro intelligence and urban residence. In T. M. New- 
comb & E. L. Hanlley (Eds.), Readings in social psychology. New 
York: Holt, 1947. Pp. 24-32. 

Langsam, Rosalind. A factorial analysis of reading ability. Journal of 
Experimental Education, 1941, 10, 57-63. 

Lennon, R. T. What can be measured? The Reading Teacher, 1962, 15, 
326-37. 

Lennon, R. T. Testimony of Dr. Roger T. Lennon as expert witness on 
psychological testing. New York: Harcourt, Brace, & Woild, 1968. 

Levy, J. Readability level and differential test performance, a language 
revision of the study of values. Journal of Educational Psychology , 
1958, 49, 6-12. 

Maney, Ethel Swain. Literal and critical reading in science. Unpub- 
lished doctoral dissertation, Temple University, 1952. 

Mills, R. E., & Richardson, Jean R. What do publishers mean by 
“grade level”? The Reading Teacher, 1963, 16, 359-62. 

Preston, R. C. Reading achievement of German and American children. 
School and Society, 1962, 2, 350-54. 

Purcell, Barbara A. Methods of teaching reading: a report of a 
tri-state survey. Elementary School Journal, 1958, 58, 449-53. 





30 



Reading: what can be measured? 



Ramsey, W. C. ^ Will tomorrow’s teachers know and teach phonics? 
The Reading Teacher, 1962, 15, 241-45. 

Raygor, A. L. Problems in the substrata-factor theory. Reading Re- 
search Quarterly, 1966, 1 (3), 147-50. 

Russell, D. H., & Fea, H. R. Validity of six readability formulas as 
measures of juvenile fiction. Elementary School Journal, 1951, 52, 136- 
44. 

Ryan, Ellen Bouchard, & Semmel, M. I. Reading as a constructive 
language process. Reading Research Quarterly, 1969, 5, 59-83. 

Schubert, D. G. Teachers and word analysis skills. Journal of De- 
velopmental Reading, 1959, 2, 62-64. 

Sears, Pauline S. The effect of classroom conditions on the strength of 
achievement motive and work output on elementary school children. 
(Report of Cooperative Research Project No. 873) Washington, D. C.: 
U. S. Office of Education, 1963. 

Sheldon, W. D. Specific principles essential to classroom diagnosis. 
The Reading Teacher, 1960, 14, 2-8. 

Sheldon, W. D., & Carrillo, L. Relation of parents, home and certain 
developmental characteristics to children’s reading ability. Elementary 
School Journal, 1952, 52, 262-70. 

Sochor, E. Elona. Literal and critical reading in social studies. Un- 
published doctoral dissertation, Temple University, 1952. 

Spache, G. D., & Baggett, Mary E. What do teachers know about 
phonics and syllabication? The Reading Teacher, 1965, 19, 96-99. 

Sparks, J. N., & Mitzel, H. E. A reaction to Holmes’ basic assump- 
tions underlying the substrata-factor theory. Reading Research Quar- 
terly, 1966, 1 (3), 137-45. 

Spaulding, R. L. Achievement, creativity, and self-concept correlates 
of teacher-pupil transactions in elementary school classrooms. (Report 
of Cooperative Research Project No. 1352) Washington, D. C.: U. S. 
Office of Education, 1963. 

Stoker, H. W., & Kropp, R. P. The predictive validities and factorial 
context of the Florida state wide ninth-grade testing program battery. 
Florida Journal of Educational Research, 1960, 1 , 105-14. 

Swanson, C. E., & Fox, H. G. Validity of readability formulas. Jour- 
nal of Applied Psychology, 1953, 37, 114-18. 

Thurstone, L. L. Note on a reanalysis of Davis’ reading tests. Psy- 
chometrika, 1946, 11, 185-88. 







Measurement in reading: general perspectives 



31 







Trailer, A. E. A study of the Van Wagenen-Dvorak Diagnostic Exami- 
nationof Silent Reading Abilities. Educational Records Bulled, t, 1941, 

Trader, A. E„ & Spaulding, Geraldine. Sex differences in achievement. 
Educational Records Bureau, 1954, 63, 69-80. 

Wade E W. The construction and validation of a test of ten teac .vr 
S „td in reading instruction, grades two ,0 live. Unpubltshed 
doctoral dissertation, Indiana University, I960. 

Wallen, N. E., & Wodtke, K. H. Relationships between teacher char- 
acteristi's and student behavior— part 1. (Report of rojec 
Washington, D. C.: U. S. Office of Education, 1963. 

Ware, Florence E. 
in low third grade. 

22-24. 



Effect on reading achievement of undertesting pupils 
California Journal of Educational Research, 1956, 7, 



Will, Miriam E. A study of teacher awareness of '^ning as a factor 






oHiiratinn 



626-36. 



I 










Test references 



An asterisk after a test listing indicates the test is included 
in the Guide to Tests and Measuring Instruments in Reading 
which appears after Chapter 6. 

California Achievement Tests E. W. Tiegs & W. W. Clark. Monterey, 
Calif.: California Test Bureau, 1933, rev. 1963. 



Cooperative English Test: Reading Comprehension Test C. Derrick, D. 
P. Harris, & B. Walker. Princeton, N. J.: Educational Testing Service, 
Cooperative Test Division, 1940, rev. 1960*. 

Gates Reading Survey A. I. Gates. N. Y.: Bureau of Publications, 
Teachers College, Columbia University, 1939, rev. I960. 

Gray Oral Reading Test W. S. Gray. Indianapolis: Bobbs-Merrill Co., 
1963, rev. 1967*. 

Stanford Achievement Test: Reading Tests T. L. Kelley, \MaddenE. 
F. Gardner, & H. C. Rudman. N. Y.: Harcourt, Brace, & World, 1922, 

rev. 1964*. 






Problems in measuring reading sub-skills 



{ 




Standardized tests, the most common device for measuring 
reading ability, divide reading into a number of sub-skill areas. 
In every instance, this division is arbitrary since there is almost 
no research evidence supporting it. However, since most stand- 
ardized reading tests are organized around separate sub-skill 
areas and since the teacher has to work with existing tests, the 
problems of measurement are discussed in this chapter as they 
apply 'to the most commonly found sub-skill divisions. These 
sub-skill areas include reading vocabulary, rate, comprehension, 
and rate of comprehension. The discussion of these sub-skill 
areas is organized so that the reader is presented with a review 
of methods used for measuring each skill and the problems in- 
volved in such measurement. This is followed by an examina- 
tion of validity and reliability studies relevant to measuring each 
skill and by projections for further research. * 



Reading vocabulary 

Over ninety per cent of group survey tests of silent reading 
ability include a separate measure of reading vocabulary. The 
inclusion of such a measure is, on the surface, highly reasona- 
ble. In fact, it has been suggested by several reading authorities 
(Karlin, 1964; Wilson, 1967) that vocabulary scores provide 
teachers with diagnostic insight into the reading ability of stu- 
dents. However, the wide array of procedures used to measure 
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reading vocabulary cast doubt as to whether reading vocabulary 
is a specific sub-area of reading. 

Kelley and Krey (1934) studied standardized vocabulary 
and reading tests and delineated 26 different approaches for 
measuring knowledge of word meanings. In a list adapted from 
Dolch (1927), they categorized the approaches as follows: 

1] Unaided recall 

A. Checking for familiarity 

B. Using words in a sentence 

C. Explaining the meaning 

D. Giving a synonym 

E. Giving an opposite 

2] Aided recall 

A. Recall aided by recognition 

1 . Matching tests 

2. Classification tests 

3. Multiple-choice tests 

a. Choosing the opposite 

b. Choosing the best synonym 

c. Choosing the best definition 

d. Choosing the best use in sentences 

4. Same-opposite tests 

5. Same-opposite-neither tests 

6. Same-different tests 

B. Recall aided by association 

1 . Completion test 

2. Analogy test 

C. Recall aided by recognition and association 

1 . Multiple-choice completion test 

2. Multiple-choice substitution test 

(Kelley & Krey, 1934, p. 103) 

In conclusion, Kelley and Krey stated that there did not seem to 
be any one best technique for measuring word meaning knowl- 
edge. They added that with present instruments there was little 




Problems in measuring reading sub-skills 



35 



hope of accurately determining the extent or the quality of the 
reading vocabulary of an individual. 

An attempt to analyze the behavior involved in a child’s 
knowledge of the meaning of a word was undertaken by Cron- 
bach (1942). Cronbach’s categorization of such behavior can 
be presented as follows: 

1 ] Generalization — Can the child define the word? 

2] Application — Can the child recognize an illustration 
of the word if properly named by the word? 

3] Breadth of meaning — Can the pupil recall different 
meanings of the word? 

4] Precision — Can the pupil apply the term correctly in 
all possible situations? 

5] Availability — Does the child actually use the word? 

The methods of measuring vocabulary listed by Kelley and 
Krey and the categories of behaviors involved in vocabulary 
skills devised by Cronbach suggest that the measurement of 
reading vocabulary is indeed a complex task. If one just looks 
at standardized reading tests, it is obvious that many sub-tests 
are labelled vocabulary. However, on closer examination it be- 
comes hard to believe that all these sub-tests of vocabulary are 
measuring the same thing since the procedures used and the 
types of behaviors sampled vary from sub-test to sub-test. For 
example, the Gates-MacGinitie Reading Test includes a reading 
vocabulary sub-test of fifty items to be completed within fifteen 
minutes; for each item, a word is given in isolation and the ex- 
aminee is asked to select the best synonym from five alterna- 
tives. But, the Diagnostic Reading Tests: Upper Level differs 
from the Gates-MacGinitie: it has a vocabulary sub-test consist- 
ing of sixty items to be completed within ten minutes; for each 
item, a definition is given and the examinee is to select from five 
alternatives the word defined. The vocabulary sub-test of the 
Nelson-Denny Reading Test: Vocabulary-Comprehension-Rate 
contains one hundred items to be completed within ten minutes; 
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for each item, an incomplete definition statement is given and 
the examinee is to select from five alternatives the best word to 
complete the definition. In each of these three vocabulary tests 
the task is quite different. Time is a stringent factor on one test 
but not on the other two; the words to be defined are in isola- 
tion in one test but not on two others; and, on one test, the 
match is between a word and a synonym and in the other be- 
tween a word and a definition. 

The confusion caused by the diversity of methods for mea- 
suring reading vocabulary poses a serious problem for the test 
consumer: which reading vocabulary sub-test should be selected 
from the many available? Assuming that reading vocabulary is 
a distinct and measurable sub-skill of reading, the problem of 
test selection can be mitigated somewhat if when choosing a 
test, the goals of the test are matched with the instructional 
goals. For example, if a variety of procedures have been used 
to foster vocabulary growth, then the vocabulary test used to as- 
sess this growth should include a wide range of tasks. What is 
important is that the test sample the same behaviors as those de- 
veloped through the instructional program. This is not teaching 
for a test, rather it is selecting a test which measures growth to- 
ward the specified objectives of the reading program. 

While the use of tests which do not measure reading vocabu- 
lary as it has been developed in the classroom constitutes a 
prevalent problem in measuring vocabulary, there are important 
problems which should be considered. A number of vocabu- 
lary te,ts at every grade level impose such severe time limits 
that many students are unable to complete the test. This hap- 
pens most in upper level reading tests. Time limits do tend to 
increase the reliability of any given test, but, at the same time, 
they reduce the test’s validity as a vocabulary measure. When 
speed and vocabulary are tested together, what is being mea- 
sured is some unknown combination of the two, rather than just 
reading vocabulary or just reading speed. A test which con- 
founds the two cannot validly assess the reading vocabulary of a 
slow but methodical reader. 
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The inclusion of a speed factor in measuring reading vocab- 
ulary may partially account for the significant improvement of 
an experimental group over a control group reported in many 
research studies. A speed factor in measurement also allows 
the well-known Hawthorne effect to have more effect than 
usual. Such an effect would be easy to demonstrate. For ex- 
ample, a Hawthorne effect could be built into two parallel stud- 
ies of reading vocabulary improvement: in the first study, the 
vocabulary tests would be untimed; in the second study, the 
tests would be timed so that only the top quarter of the class 
would have enough time to finish. The test results might well 
show more significant reading gains in the study in which the 
tests were timed. 

Another source of confusion in vocabulary measurement 
arises from the vast number of tests which attempt to assess vo- 
cabulary skills by presenting words in isolation and directing stu- 
dents to select the “best” synonym from a number of alterna- 
tives. This method does not reflect reading vocabulary skills as 
an individual actually applies such an ability in a practical read- 
ing situation. Goodman (1968) has pointed out that reading is 
a psycholinguistic guessing game and that a student relies quite 
heavily on the semantic and syntactic context clues of a reading 
passage in determining the meaning and pronunciation of a 
word. H. L. Smith (1956), another well-known linguist, has 
seriously questioned the validity of defining any word out of 
context. 

Still another problem in measuring vocabulary improvement 
relates to the use of so-called equivalent forms of tests. The 
equivalency of most reading vocabulary tests is based on a sta- 
tistical rather than a logical basis. An analogy may be drawn 
from the high jumping and long jumping ability of students. It 
is possible to determine the distance (long jumping) and height 
(high jumping) ability for a group of students and compute the 
equivalency between the two; the raw scores from each could 
then be changed to grade equivalents and presented in a table. 
If the long jump measure is used as an indication of improved 
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performance following a semester of high jumping classes, this 
would constitute a statistically equivalent test but not a logically 
equivalent test because the content of the two measures would 
not be the same. This same situation is true of vocabulary tests 
despite test developers’ attempts to control the content va- 
lidity of equivalent forms. If two forms of a test were equiva- 
lent, a raw score of fifty on one test would mean exactly the 
same thing as a raw score of fifty on another form. A number 
of factors about the nature of the words used on each test, how- 
ever, make any two forms of a vocabulary test far from equiva- 
lent: relative word length, subject matter, part of speech, diffi- 
culty of discrimination among alternatives, word lists used for 
the selection of items, poor items, etc. Very few studies have 
investigated these types of problems related to test equivalency. 
However, a start has been made by Hinton (1959) who found 
that the sub-tests of two forms of the Diagnostic Reading Tests 
were quite unequal in difficulty. 

Validity of reading vocabulary measures 

Are standardized tests of reading vocabulary valid measures 
of the quality or depth of a student’s vocabulary power? 
Several researchers have dealt with this question. Dolch and 
Leeds (1953) examined five tests of reading vocabulary: the 
Thorndike, the Gates, the Durrell-Sullivan, the Stanford, and 
the Metropolitan. They concluded that the tests do not meas- 
ure depth of word meaning because they: 1] ignore all but 
the most common meaning of words; and 2] when synonyms 
are used, a very indefinite amount of knowledge is tested. 
Dolch and Leeds suggested that the most serious weakness of 
the five tests is that they fail to recognize that words have differ- 
ent meanings for different people and that there is no one 
“meaning” for any particular word. Instead, they claimed, 
each word has a variety of meanings. While these points are 
well taken, they are severely limited by the lack of statistical ev- 
idence and specific validity criteria. 
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The fact that most reading vocabulary tests are quite similar 
to one another regardless of their intended grade level use has 
led researchers to question the validity of using the same type of 
vocabulary test at all grade levels. Feifel and Lorge (1950) 
examined the types of oral vocabulary responses of 900 children 
between the ages of 6 and 14 and found: 1] older children 
(ages 10 to 14) more often use a synonym-type definition than 
younger children (ages 6 to 9), and 2] younger children sup- 
ply and use description-type definitions more than older chil- 
dren. If spoken vocabulary can be used as an indication of 
reading vocabulary development, Feiiel and Lorge s study could 
be used as a basis for the development of differentiated proce- 
dures for measuring reading vocabulary at different age levels. 

Kruglov (1953), in investigating the quality of reading vo- 
cabulary responses of students at various age levels, adminis- 
tered a ten-item five-option multiple-choice test to pupils in 
grades three, five, seven, and eight and to a group of college 
graduates. For each test item, three or four options were cor- 
rect but were of different qualitative levels. Therefore, Kruglov 
concluded that: 1] there is an increase in the choice of a syno- 
nym as the correct response for older students; 2] there is a 
significant decrease in the per cent of repetition, illustration, and 
inferior explanation-type responses between students in grades 
three through eight and college graduates; and 3] there are 
no differences in the use of description-type responses and 
explanation-type responses between any of the groups tested. 

The preceding studies present rather conclusive evidence 
that there are qualitative differences in students’ responses to 
vocabulary items: younger students tended to choose more con- 
crete definitions (description and use) while older students 
chose more abstract definitions (synonyms and classifications). 
The ability of present vocabulary tests to measure these differ- 
ences in student responses has been studied by several of the 
preceding authors who consistently pointed ov f that the tests are 
inadequate for measuring all but the very lowest level of vocab- 
ulary ability. 
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Russell (1954) made various suggestions for improving the 
validity of reading vocabulary measures. The most serious 
problem in testing vocabulary, according to Russell, is that of 
determining verbalization — whether or not students supply cor- 
rect answers without a real understanding of the concept to 
which they are responding. As have many others in the field of 
reading such as Kruglov (1953) and Dolch and Leeds (1957), 
Russell recommended that words to be used as test items be 
placed in as meaningful a situation as possible and that vocabu- 
lary tests be developed which evaluate the quality of students’ 
reading vocabulary. Such measuring devices would include 
items designed to assess students’: 1] precision in knowledge 
of words, e.g., the ability to discriminate between words such as 
valley and canyon; 2] breadth of vocabulary indicated by the 
number of words recognized and knowledge of multiple mean- 
ings of words such as run and strike ; and 3] ability to use vo- 
cabulary in speaking, writing, and reading. 

Another point of controversy has been the usefulness of 
standardized reading tests in determining the size of a student’s 
vocabulary. Mary K. Smith (1941) conducted a number of 
studies which have shown that the usually accepted estimates of 
the size of students’ listening vocabulary may be vastly underes- 
timated because the test constructors used abridged dictionaries 
in selecting the words included in that test. Estimates of vocab- 
ulary size based on a sampling from unabridged dictionaries by 
Smith indicated that the average first grader knows 24,000 dif- 
ferent words, the average sixth grader knows 49,500 words, the 
average high school student knows 80,000 words, and the aver- 
age university student knows 157,000 different words. Most 
other estimates (Buckingham & Dolch, 1936; Rinsland, 1945; 
Thorndike, 1931; Seashore & Eckerson, 1940) of vocabulary 
size, upon which instructional materials and tests have been 
based, are much lower than this. Bryan (1953) claimed that 
the estimates by Smith may also be too low. To determine vo- 
cabulary size, Bryan used three vocabulary tests: a free associa- 
tion test, a stimulus-response test, and a multiple-choice recog- 
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nition test. The estimates of the number of words that children 
knew were larger when the following methods were used: 

1] testing the children in a greater number of socio-economic 
areas of the country; 2] testing children more often during the 
year so that various holidays, seasons, and recreational activities 

would serve to recall additional words; 3] reconstructing for *t 

children a greater number of their common areas of experience. 

The studies cited thus far cast considerable doubt on the 
ability of present standardized tests to measure the qualitative 
or quantitative aspects of vocabulary. Perhaps a more impor- 
tant question is whether standardized tests can validly measure 
reading vocabulary as distinct from other reading skills. Most 
validity studies of reading skills have used correlation tech- 
niques to point out that there is so much overlap between sub- 
skills that almost all of the variance, on the standardized reading 
tests is taken up by some kind of general factor. 

V. H. Hughes (1953) correlated scores of 332 fifth graders 
on tests of word meaning and reading comprehension with 
scores made on tests of other aspects of language ability such as ! 

spelling, punctuation, capitalization, language usage, paragraph j 

organization, and sentence sense. Despite the fact that the j 

study was not designed specifically to isolate sub-test variance, .•» j 

Hughes found that there is a very high degree of overlap be- ! 

tween all the tests of language skills. 

Another study which emphasized the lack of discriminant 
validity for vocabulary tests was conducted by Garlock, Dollar- 
hide, and Hopkins (1965). The Wide Range Achievement 
Test (a reading recognition vocabulary test) and the Gilmore 
Oral Reading Test were found to provide almost identical inter- 
changeable information. However, the findings of Garlock and 
his colleagues are somewhat limited because of the atypical 
population of mentally retarded pupils they studied. 

Farr (1968) in a convergent-discriminant validity study of 
three upper level reading tests repotted that none of the three 
sub-tests of reading vocabulary evidenced any discriminant va- 
lidity (the validity of tests as measures of distinct skills or abili- 
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ties). For example, the vocabulary test of the Nelson Reading 
Test correlated .76 with the vocabulary sub-test of the Califor- 
nia Reading Test; however, the vocabulary sub-test of the Nel- 
son correlated with the comprehension sub-test of the California 
test also at .76; and the vocabulary sub-test of the California 
test correlated at .73 with the comprehension sub-test of the 
Nelson test. Certainly the specific (discriminant) validity of 
the sub-tests of vocabulary as measured by these two tests 
should be seriously questioned. 

Reliability of measures of reading vocabulary 

Research on the reliability of reading vocabulary tests is a 
rarity. The reliability coefficients provided by most test pub- 
lishers have been based on an internal consistency procedure. 
In reporting them, the test publishers often fail to describe in 
detail the population used in determining the coefficients. This 
kind of omission seriously limits their usefulness. 

Two factors which have been shown to influence the reliabil- 
ity of vocabulary test scores are related to directions on guessing 
and the timing of tests. Swineford and Miller (1953) investi- 
gated the effects of three sets of directions on the amount of 
guessing on reading vocabulary tests. Students either were told 
1] they should avoid guessing, 2] they should guess even 
when they did not know the answer, or 3] they were given no 
directions regarding guessing. The group which was told not to 
guess responded to substantially fewer items than either of the 
other two groups. Swineford and Miller found that too many 
difficult items on a test or too much guessing seriously reduces 
the test’s reliability. 

Slakter (1967) has shown that if examinees are discouraged 
from guessing because a penalty has been imposed for it, the 
test scores of the examinees reflect the risk-taking of the exami- 
nees as well as their achievement. If the test maker is more 
concerned with validity than reliability, Slakter urged that he 
construct tests in which examinees are encouraged to answer all 
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questions. If such directions are used, it is crucial that the test 
be of appropriate difficulty. A test which is extremely difficult 
for a particular group and in which students are encouraged to 
guess would have an exceedingly low reliability. 

Boag and Neild (1962) explored the effects of timing on the 
reliability of the vocabulary section of the Diagnostic Reading 
Test. They found that the relative standings of some high 
school students changed when they were given additional time 
on the vocabulary test. Thus, it was concluded that speed and 
power of reading scores should be not used interchangeably. 
One additional finding was that changes in relative standings 
under timed and untimed conditions occur with considerably 
greater frequency through the middle range of scores than they 
do at either extreme of the distribution. 

The measurement of reading vocabulary is far from an exact 
science. The studies reviewed here indicate that there is confu- 
sion about how to measure reading vocabulary or whether there 
is a unitary trait which can be labelled reading vocabulary. 
Perhaps the most important conclusion that can be derived from 
this review is that there is a lack of evidence to support the con- 
tention that vocabulary can be measured as a distinct sub-area 
of reading. 

Reading vocabulary: needed approaches 

The most important research need in measuring reading vo- 
cabulary is the development of tests based on sound theoretical 
and empirical evidence concerning the components of reading 
ability (Kingston, 1965). While it has been logically argued 
that a person can know the meanings of many words he reads 
and, at the same time, lack the ability to weave these meanings 
together in reading sentences and paragraphs, this contention 
has no empirical basis. Until such evidence is forthcoming, any 
attempts to “diagnose” reading vocabulary as distinct from 
reading comprehension or other areas should proceed on very 
cautious grounds. It seems possible that research may reveal 
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that the logical analyses of sub-skills constituting reading behav- 
ior have been quite inaccurate and that such skills need to be 
re-examined from a radically different point of view. This 
conclusion would not only have serious implications for test de- 
sign and content, but also for the development of reading pro- 
grams and instructional materials. 

If past test developers’ and researchers’ attempts to measure 
vocabulary as a distinct sub-area prove to be successful, the 
study of the qualitative differences in reading vocabulary re- 
sponses at different age levels should then become the focal 
point of future research. Studies already carried out have 
indicated that the usual methods do not supply adequate 
information concerning many aspects of reading vocabulary. 
An initial undertaking along these lines might well take the 
form of verifying Kelley and Krey’s (1934) 26 methods for 
measuring reading vocabulary. 

Estimates of the size of students’ vocabularies need updat- 
ing. The availability of computer techniques as well as newly 
developed sampling procedures certainly would facilitate the 
task of developing grade and age level vocabulary lists. Such 
studies should be able to profit from the early work of Thorn- 
dike (1931) and Seashore and Eckerson (1940) and more re- 
cently from that of Dale (1949) and Bryan (1953). 

Speed of reading 

For the purposes of the review which follows, reading speed 
is defined as the number of words read within any given time 
period. Comprehension as it relates to reading speed is dis- 
cussed separately later in this chapter. 

The development of faster reading speed is an important, if 
not central, goal of many high school and college reading pro- 
grams. This is certainly justifiable on both logical and empiri- 
cal grounds — the large volume of reading required in most aca- 
demic and vocational endeavors is reason enough for the devel- 
opment of speed reading programs. The realization that many 
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high school and college students read at a rate below their po- 
tential and that reading speeds can easily be increased has led to 
the rapid expansion of these programs over the past decade. 

Because of the recent emphasis on reading speed, a number 
of standardized tests — Burnett Reading Series: Survey Test, 
Diagnostic Reading Tests, and the Gates-MacGinitie Reading 
Tests — have been developed. Most of these include some type 
of comprehension check for two reasons: 1] the belief that 
faster reading results in better comprehension; 2] the belief 
that reading speed is unimportant unless some minimal level of 
comprehension is maintained. That faster reading results in 
better comprehension is not at all substantiated by research evi- 
dence. In fact, the validity of correlations between reading 
speed and comprehension has been a point of controversy for 
many years. In the 1930’s, Eurich (1930), Anderson and Tin- 
ker (1932), and many others began reporting moderately high 
correlations between rate and comprehension. In 1942, how- 
ever, Stroud pointed out that most of the early studies relating 
speed to comprehension were invalid because they were based 
on comprehension scores derived from timed tests and, there- 
fore, the comprehension score was contaminated by a speed fac- 
tor. 



A study by Flanagan (1937) emphasized this point. 
Flanagan collected two scores for subjects on a literary compre- 
hension test: a level of comprehension score and a rate of com- 
prehension score. The level of comprehension score was based 
on the average number of comprehension items answered cor- 
rectly on four twenty-item scales. The rate of comprehension 
score was the total number of items answered correctly on all 
eighty items minus a correction for guessing. Flanagan com- 
puted a positive correlation of .77 between these two scores, 
thus indicating a great deal of trait similarity. However, when 
he correlated a rate of reading score (determined by the total 
number of items completed within a time limit) with the level of 
comprehension score, the correlation was only .17. 

The belief that reading speed is unimportant unless some 
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minimal level of comprehension is maintained seems quite 
logical. Certainly a reader should not increase his reading 
speed if by so doing, he is unable to comprehend what he is 
reading. However, if reading speed and comprehension are 
unrelated, there appears to be justification for measuring speed 
separately from comprehension. Many students who are slow 
readers are good comprehenders, but there are also many slow 
readers who are poor comprehenders. On the other hand, both 
good and poor reading comprehension occurs among relatively 
fast readers. Stroud (1969) has emphasized that there is as 
much point in one’s reading rapidly what he does not under- 
stand as there is in his reading it slowly. 

Applying comprehension checks to measure reading speed 
has resulted in another problem. Comprehension scores have 
been used on many reading rate tests as if they formed a perfect 
ratio scale, that is, as if there were an absolute zero point on the 
test. On one such test, an individual’s score is determined by 
multiplying his reading rate with his per cent of comprehension. 
The reason for this, according to the test developer, is that the 
reading speed score should be reduced by the reader’s level of 
comprehension. The invalidity of this procedure is easily illus- 
trated through the following hypothetical situation. Suppose an 
examinee reads 300 words per minute and scores 85 per cent on 
comprehension. Multiplying the two would result in a reading 
speed score of 255 words per minute. If the examinee merely 
reads the title of the selection and then reports that he had read 
the material, his speed would be taken as being approximately 
20,000 words per minute. A subsequent comprehension score 
of 55 per cent would result in a rate of reading score of 11,000 
words per minute. Such a comprehension score without read- 
ing would not be unusual since examinees can always answer 
several questions correctly on the basis of their prior knowledge 
and several other items can be guessed correctly. The main 
point here is that combining comprehension with measures of 
reading rate detracts from the validity of measuring speed of 
reading. 
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The failure of test developers to provide specific purposes 
for reading a given selection also poses a measurement problem. 
Several studies (McDonald, 1966; Sheldon, 1955) have shown 
that purpose can have a strong influence on rate of reading. 
Most tests of reading speed merely require that the examinee 
read the material and answer the questions that follow. 
McDonald (1958) attempted to deal with this problem by de- 
veloping a reading versatility inventory. McDonald’s inventory 
tries to measure the reader’s ability to change his reading speed 
and approach to suit his purpose for reading. The inventory is 
composed of three reading selections each containing different 
directions. The first set of directions asks the student to read 
the material carefully; the second asks him to read rapidly; and 
the third asks him to skim. McDonald reported that flexible 
readers complete each succeeding part of the inventory 1.8 to 2 
times faster than the preceding one. Two important points 
about McDonald’s inventory should be made: 1] while both 
speed and comprehension scores are determined for each read- 
ing selection, there is no attempt to combine the scores; 2] the 
subjects in McDonald’s study were not allowed to look back 
when answering the questions. Whether or not examinees 
should be permitted to look back at the reading selection when 
answering test items is a controversy in measuring reading com- 
prehension which is discussed later. 

Despite McDonald’s attempts to build a measure of reading 
flexibility, there is little evidence that most students have any 
ability to adjust their reading rate to suit specific purposes. 
McDonald (1966), in an overview of research studies, con- 
cluded that the vast majority of readers are untrained in reading 
flexibility and, therefore, do not change their reading rate to any 
great extent even when instructed to read for different purposes. 
In a more recent study of fourth graders, Gifford and Marson 
(1966) supported McDonald’s conclusion. The subjects in 
their study did not vary their reading speed to suit the specific 
purposes of reading for main ideas and details. The fact that 
readers do not adjust their speed in different situations should 
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not be taken as an indictment of reading tests; if anything, it 
points out the shortcomings of reading programs which develop 
such inflexible readers. If reading tests were to include more 
specific directions about purposes for reading and would vary 
these purposes, more reading programs might begin to teach 
flexibility. 

Several studies which test developers as well as consumers 
should particularly heed have focused on the effects of typogra- 
phy on reading speed. In Hvistendahl’s (1965) study, subjects 
were presented the same magazine page in four different for- 
mats: one with paragraph heads, another with boldface lead-ins, 
another with boldface paragraphs, and a final one containing no 
typographical aids. Each was also presented in two- and 
three-column format. Rate of reading was determined by ask- 
ing the subjects which page they thought they could read fastest. 
The results were statistically significant in favor of all the pages 
containing typographical aids, but there were no significant dif- 
ferences in the use of two- versus three-column format. These 
findings are limited, however, because of the criteria used to de- 
termine rate of reading. 

The effect of print size on the reading speed of first, second, 
and third graders was examined by McNamara, Patterson, and 
Tinker (1953). The print used ranged in size from 8 to 24 
point. Little difference in the rate of words read for any of the 
type sizes were found at the first two grade levels. In grade 
three, there was a definite trend indicating students read mate- 
rial set in 10, 12, and 14 point type faster. Therefore, McNa- 
mara, Patterson, and Tinker advised that the type size of read- 
ing material should not be a consideration in selecting materials 
in the first two grades only because of rate which is not an im- 
portant factor in reading instruction in these grades. There are 
other factors which do make size of print important at this age. 
As reading skills develop to the level found at the third grade 
where rate is more important, size of type does have an effect 
on speed and should be considered in selecting materials. 
While size of print may exert the greatest single influence on 
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rate of reading, other factors such as type style, line width, page 
format, color of print and background, illumination, and the 
reading situation also should be considered (Tinker, 1963). 

A number of studies have pointed out that there is a maxi- 
mum limit in the number of letters that can be effectively per- 
ceived by a reader at any one time (Newman, 1966, p. 272). 
According to a review of the research made by Stroud (1942), 
this limit is generally thought to be five to eight letters or about 
six to eight fixations per line by the average mature reader (An- 
derson, 1937; Buswell, 1922). Newman’s (1966) study was 
concerned with determining the reader’s lower limits in effec- 
tively perceiving letters. Using rather unique equipment, New- 
man contended that when the number of letters presented to a 
reader at a single exposure falls below a minimal level, the sub- 
ject does not receive enough contextual help from surrounding 
letters and reading is disrupted. 

Validity and reliability of tests of reading speed 

The validity of reading speed tests has been questioned in a 
number of instances. The most significant factor affecting va- 
lidity is the attempt to combine reading speed with comprehen- 
sion. This factor has resulted in the construction of tests of rate 
or speed or comprehension, not speed of reading. 

The effect on reading rate scores of the difficulty level and 
interest appeal of the reading selections included in a particular 
test is in itself a basis for raising the validity question in any 
facet of reading measurement. However, because of the sus- 
ceptability of reading speed tests to the Hawthorne effect, it is 
more of a problem in measuring reading rate than in measuring 
other reading skills. One attempt to investigate the relationship 
between reading rate and the interest appeal of reading selec- 
tions was undertaken by Bryant and Barry (1961). They con- 
cluded tha* interest did not significantly influence reading rate in 
the case of relatively simple, narrative articles. The procedure 
used by Bryant and Barry involved asking subjects which of two 
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articles they found more interesting. From a sample of one 
hundred cases, two groups of 17 were selected: one group had 
favored the first selection while the other had favored the sec- 
ond. This procedure does not seem valid for selecting materials 
with much positive or negative attraction; the small number of 
students choosing selections as “most interesting” would seem 
to support this contention. Thus, the mildly positive or negative 
attitudes which Bryant and Barry found is not surprising, nei- 
ther is the lack of significant differences in reading speed. 

Significant differences were found in reading speed for vary- 
ing difficulty levels of reading material by Carlson (1951). 
The primary statistical procedure used in the study was a Pear- 
son Product-Moment correlation. As would be expected, all of 
the correlations between reading rate and level of difficulty were 
significant at the .01 level, but they were not large enough for 
any predictive use. Carlson’s study also pointed out the limita- 
tion of measuring reading speed when the difficulty level of the 
reading material is not controlled. The effects of the reading 
level of materials on the measurement of reading rate is a prob- 
lem which is often overlooked. If a student is unable to read 
seventh-grade material, despite the fact that he may find himself 
in a seventh-grade class, a test of reading speed utilizing mate- 
rial of seventh-grade difficulty is probably not a valid measure 
of reading speed for this student. Should a subject’s reading 
speed be determined on material of relatively easy reading? 
Or, should he be reported as having several reading speeds 
depending on the difficulty and interest appeal of the material? 
These questions have not been answered in research studies and 
are generally ignored by most constructors of reading rate tests. 

Another problem affecting the validity of reading speed 
scores is the apparent “slack” that most readers seem to have in 
normal reading speeds. Laycock (1955) investigated the effect 
of giving students a mental set to read faster without decreasing 
their comprehension. Under these conditions, subjects in- 
creased reading speed by as much as forty per cent. These re- 
sults suggest the possibility that supposed gains in reading 
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speed, following a session of reading improvement classes, may 
be due to the new “mental set” students have established. 

Few studies focus on the reliability of tests of reading rate. 
Traxler (1938), for one, studied the relationship between the 
length and the reliability of one of these tests. Seventy-eight 
high school juniors were given two forms of a 177-line reading 
rate test in alternate order. The students were asked to mark 
the line they were reading at the end of each hundred seconds. 
Traxler then correlated the number of lines read at each 
hundred seconds between forms A and B. The correlations 
were significantly higher (.86) for four hundred seconds than 
for one hundred seconds (.62). Traxler concluded that the 
time allowed for most tests of reading rate (one minute to five 
minutes) is too short for high reliability. He called for the de- 
velopment of tests two or three times the length of those extant. 
This same plea is valid today. 

The most important research need in the measurement of 
reading rate is a thorough analysis of how students develop 
faster reading speeds that can serve as a basis for test construc- 
tion. In particular, the phenomenal reading rates achieved by 
students in some rate improvement programs should be studied 
more closely, especially since some of the rates reported have 
exceeded the physiological limits of the normal progression of 
eye movements across and down the page. Another area de- 
manding further study concerns the difficulty and interest- 
appeal of selections used to measure reading rate. Probably the 
I best approach for such research would be through a series of 

studies combining purposes for reading, difficulty of selections, 
and interest level of selections in a three-way analysis of vari- 
ance. Such an investigation could help to determine the effects 
of each, of these factors individually as well as the unique inter- 
action effects among all three factors. Replication could then 
be conducted with a number of different age groups. Finally, 
the reliability of reading rate measures should also prove a fer- 
tile area for future study. Traxler in 1938 provided valuable 
insight into this problem, but since then very little effort has 

■n,o 






o 

ERLC 



H2B3BBI2S9 ' ** 



52 



Reading: what can be measured? 



been expended in this area. The interchangeability of test 
forms, the effects of test length at various age levels, and the ef- 
fects of typography all need to be examined if the reliability of 
reading rate measures is to be improved. 



Reading comprehension 

A review of the factors that should be considered in measur- 
ing reading comprehension reveal that this measurement task is 
extremely complex. These factors include the length, interest- 
appeal, subject matter, reading difficulty, and organization of the 
material to be read; the reader’s purpose, mental set, environ- 
mental conditions for reading, and command of basic decoding 
skills; the type of question to be used; and whether examinees 
are allowed to look back at the selection when answering ques- 
tions. Kerfoot (1968, p. 42) stated that the measurement of 
reading comprehension is a “problem of inconsistency in both 
theoretical base and descriptive terminology.” He suggested 
that, to overcome this problem, both researchers and practi- 
tioners should seek to operationally define reading comprehen- 
sion in terms of specific reading tasks. Barrett (1968) has pro- 
vided a partial response to Kerfoot’s pica for an operational def- 
inition of comprehension by developing a taxonomy of the cog- 
nitive and affective domains of reading comprehension. The 
major sections of this taxonomy include literal comprehension, 
reorganization, inferential comprehension, and evaluation and 
appreciation. 

As part of a review of research on reading comprehension, 
Davis (1968) cited an analysis of reading comprehension by 
Richards (1929), which Davis considered perceptive. Among 
the abilities which Richards included in his analysis of reading 
comprehension were literal comprehension, recognizing the wri- 
ter’s mood, comprehending the writer’s tone, and recognizing 
the writer’s purpose. 

There are presently many sub-tests of standardized reading 
tests purporting to measure sub-skills of reading comprehension. 
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However, the division of comprehension into distinct sub-skill 
areas has not been based on any validity studies^ Stroud 
(1969) realizing this deficiency, suggested that reading com- 
prehension tests include measures of the reader’s ability to re- 
call specific facts, generalize, draw inferences, and determine 

the author's purpose. However, Stroud did issue a caution: e 

suggested that such sub-tests not be used for diagnostic pur- 
poses. Instead, he urged that the sub-tests be combined. 

Ward (1956) in analyzing the results of four tests which he 
developed specifically to measure reading comprehension, also 
urged that a variety of approaches be used to measure 
comprehension. The tests Ward designed, however, contained 
much broader elements than those usually thought of as sub- 
skills of reading comprehension; they included speed, flexibility, 
understanding of ideas, and knowledge of words. 

Early attempts to isolate specific reading com P re ^ ns ‘ on 
skills employed factor-analysis techniques. Davis (1944) de- 
veloped tests of nine skills which he believed to be components 
of reading comprehension. The list of nine reading skills that 
were appraised by Davis’ tests included: 

1 ] knowledge of word meanings 

2] ability to select the appropriate meaning for a word or 
phrase in the light of its particular contextual setting 

3] ability to follow the organization of a passage and to 
identify antecedents and references in it 

4] ability to select the main thought of a passage 

5] ability to answer questions that are specifically an- 
swered in a passage 

6] ability to answer questions that are answered in a pas- 
sage but not in words in which the question is asked 

7] ability to draw inferences from a passage about its 
contents 

8] ability to recognize the literary devices used in a pas- 
sage and to determine its mood and intent 
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9] ability to determine a writer’s purpose, intent, and 
point of view, i.e., to draw inferences about a writer. 

A factor-analysis of the results of these tests administered to 
a group of college freshmen produced only five statistically sig- 
nificant skills; two of them — word knowledge and reasoning — 
accounted for 89 per cent of the total variance. The five statis- 
tically significant reading comprehension skills were: 1] word 
knowledge, 2] ability to reason in reading, 3] ability to fol- 
low the organization of a passage and to identify antecedents 
and references in it, 4] ability to recognize the literary devices 
used in a passage and to determine its tone and mood, and 
5] tendency to focus attention on a writer’s explicit statements 
to the exclusion of their implications. 

Thurstone (1946) questioned the validity of Davis’ findings 
on two bases: 1] Davis’ data showed that the nine tests were 
measures of the same reading function, and 2] the tests re- 
vealed no evidence about the components of reading compre- 
hension. Davis (1946) refuted Thurstone’s re-analysis and 
suggested that the factors he had revealed 

ought to provide individuals actually engaged in teaching 
children to read and in constructing tests of comprehension 
in reading with improved insight into the nature of reading 
comprehension and with clues for improving the teaching 
of reading and the measurement of reading. ( 1946 , p. 

188 ) 

In a more recent study, Hunt (1957) examined the correla- 
tions of a number of sub-tests of reading comprehension to de- 
termine if each of the measures of reading comprehension 
which he developed were distinct and measurable skills. Hunt 
concluded that each sub-test was measuring essentially the same 
thing and that, therefore, diagnostic measures of reading com- 
prehension needed further study. Davis (1968) interpreted 
Hunt’s study in a summary of research on measuring reading 
comprehension and concluded that the results of Hunt’s study 
were in harmony with Davis’ (1944) findings: 
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Hunt, therefore, concluded that only the vocabulary 
items were measuring a skill in comprehension (knowledge 
of word meanings) that was significantly different from the 
others. This implies that comprehension in reading in- 
volves two skills: word knowledge and paragraph compre- 
hension. These results are in harmony with Davis’ find- 
ings that word knowledge and reasoning in reading account 
for virtually all of the variance of comprehension. (1968, 
p. 508) 

Whether, in fact, they were in agreement is open to question. 
In a more recent study, Davis (1968) reinvestigated the unique 
variances of sub-tests of reading comprehension with a set of 
carefully constructed tests designed to measure specific aspects 
of reading comprehension. The five skills which Davis isolated, 
listed in order of unique variance contributing to total compre- 
hension scores, included: 1] memory for word meanings; 
2] drawing inferences from the content; 3] following the 
structure of a passage; 4] recognizing a writer’s purpose, atti- 
tude, tone, and mood; and 5] finding answers to questions 
asked explicitly or in paraphrase. These factors are quite simi- 
lar to those listed by Davis in 1944; however, this is not surpris- 
ing since the tests used by Davis in 1968 closely paralleled those 
used in the 1944 study. 

In all of the factor-analysis studies reported above, multiple- 
choice tests were used. Vernon (1962) questioned whether 
the overlap between measures of reading comprehension might 
be caused by not only the methods of testing but also by 
students’ ability to take certain kinds of tests. The tests used 
by Vernon in this study included multiple-choice vocabulary 
questions, filling in the blanks in sentences, vocabulary defi- 
nitions supplied by the examinee, reading comprehension ques- 
tions for which the examinees were allowed to look back at the 
passages after reading, and reading comprehension questions for 
which the examinees could not look back _aU The. passages ., 
Vernon’s tests of various aspects of comprehension did result in 
higher correlations within than between the types of compre- 
hension. Vernon pointed out that this uniqueness could have 
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been the result of the technical bias of the passages rather than 
the measurement of unique aspects of comprehension. 

McCullough (1957) studied whether different aspects of 
comprehension were measured by different types of questions. 
Data consisted of the responses of elementary school children to 
common types of reading comprehension questions. Included 
were questions designed to measure comprehension of main 
ideas, facts, or details; sequence or organization; and creative 
reading. McCullough found a statistically significant correla- 
tion. between all four types of reading comprehension questions. 
However, because the correlations were not substantial, McCul- 
lough cautioned that the measurement of any one of the skills 
could not be substituted for any of the other four. 

While McCullough’s suggestion is perhaps valid, there is still 
a lack of understanding about basic aspects of reading compre- 
hension. The results of the preceding studies do demonstrate 
that most attempts thus far to validly measure specific sub-skills 
of reading comprehension have not been consistent. Because 
of this failure to delineate the basic measurable components of 
reading comprehension satisfactorily, the best procedure in- 
volves using a variety of measures. Included could be tests of 
the reader’s ability to recall specific facts, make generalizations, 
draw conclusions, draw inferences, and reorganize and organize 
ideas. Sub-scores from any of these tests should not be used 
independently in any attempt to diagnose reading compre- 
hension, but rather should be combined to measure reading 
comprehension or, as Stroud put it, the effective use of reading 
text. In selecting a test of reading comprehension, the potential 
test consumer should select one which appears to measure read- 
ing comprehension as he has taught it. Despite the lack of 
evidence regarding the individual aspects of reading compre- 
hension, it still is more valid to select a test which appears to 
measure the-skill-which has. been taught. If, for example, the 
instructional program has focused on teaching students to draw 
inferences from reading selections, then the reading compre- 
hension test used to measure growth toward this objective 
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should not consist of a series of questions which measure 
immediate retention of specifically stated facts. 

What are the problems affecting the measurement of reading 
comprehension? Certainly the effects of timing, of allowing ex- 
aminees to look back at the reading selection, of prior knowl- 
edge of the content or the topic of the reading selection, of the 
language structure, and length of the selection, personality traits 
of the examinees, and purposes for reading all contribute to the 
complexity of measuring reading comprehension. 

The effects of time on reading comprehension are quite con- 
siderable. Indeed, if the measurement of reading comprehen- 
sion is timed, then speed of reading comprehension rather than 
power of reading comprehension is being evaluated. Because 
of the fairly common practice of timing reading comprehension 
tests and the contemporary stress on speed, speed of reading 
comprehension is discussed as a separate skill later. The pres- 
ent discussion involves only untimed tests of reading compre- 
hension or, as it may be labelled, power of reading comprehen- 
sion. 

One variable which has received little attention is the effect 
of allowing examinees to look back at the selection while an- 
swering questions about it after they have completed reading it. 
The decision as to whether to allow this should be made by the 
examiner, based on the objectives of the instructional program. 
If subjects are allowed to look back, is reading comprehension 
actually being measured? If they are not allowed to look back, 
does the test become a measure of immediate memory span? 
Perhaps some of the findings of studies such as Anderson and 
Tinker’s (1932), Eurich’s (1930), Gray’s (1917), and Tinker’s 
(1932), all of which showed a strong relation between rate of 
reading and comprehension resulted from not permitting sub- 
jects to look back. Flanagan (1938) found an inverse rela- 
tionship between rate of reading and extent of comprehension 
when subjects were allowed to look back at the selections. In 
a study reported previously, Vernon (1962) compared com- 
prehension scores based on subjects’ responses when they were 
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not allowed to refer back to the text in answering questions with 
when they were allowed to. Vernon concluded that the two 
procedures did indeed appear to measure different skills and 
that the latter procedure of allowing students to refer back to 
the text was more predictive of academic achievement. 

The influence of prior knowledge is another factor which 
must be considered when attempting to measure reading com- 
prehension. Preston (1962) administered the first thirty com- 
prehension questions of\|he Cooperative English Test: Reading 
Comprehension to 128' college freshmen. The reading selec- 
tions did not accompany the multiple-choice items. Greater 
than chance scores were achieved by 77 per cent of the stu- 
dents. Preston, therefore, concluded that there are many read- 
ing comprehension items that are probably invalid because stu- 
dents are able to recognize correct answers without reading the 
passages. 

Marks and Noll (1967) have proposed a technique for deal- 
ing not only with this problem but also with the broader ques- 
tion of what reading comprehension tests measure. They sug- 
gested administering the comprehension items without reading 
selections and then re-administering them several weeks later 
with the reading selections; If the number of students answer- 
ing a particular item correctly under both testing conditions is 
greater than chance, it is highly probable that the item is testing 
something other than reading comprehension. 

Another area which might influence reading comprehension 
is language structure and length of the reading selection. 
Ruddell (1965) examined the relationship between reading 
comprehension scores and the similarity of the structure of oral 
language used by children. The analysis of the language struc- 
tures was based on work done by Strickland (1962). The pri- 
mary factor of language structure manipulated was word order. 
Ruddell (1965, p. 273) concluded: “Reading comprehension 
scores on materials that utilize high frequency patterns of oral 
language structure are significantly greater than reading com- 
prehension scores on materials that utilize low frequency of oral 
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language structure.” It is certainly clear from this particular 
study that children’s language patterns should be considered 
when parallel test forms are being constructed. 

Length of reading selections has been shown not to have any 
differential effect on the reading comprehension of examinees 
(Traxler, 1938; Humphrey, 1957). Students who score high 
on comprehension tests covering short selections also tend to 
score high on comprehension tests covering relatively longer se- 
lections. However, while there does not seem to be any discrim- 
inant validity involved, it does seem worthwhile to suggest that 
both longer and -shorter selections should be included on any 
general test of reading comprehension. But the scores derived 
from each should not be considered separately; for the most 
part, short selections are used on standardized reading tests. 
The use of these tests in research studies may have led to the 
conclusion that length of the selection does not affect reading 
comprehension. 

There has been a noticeable absence of significant findings in 
studies relating personality variables to reading comprehension 
test scores. Most of the studies [for example, the Gann (1945), 
Garrett (1949), and Spache (1954) studies] have failed to re- 
veal any personality patterns for poor readers. Kleck and 
Wheaton (1967) did find that dogmatism was highly related 
to reading comprehension scores on opinion-consistent and 
opinion-inconsistent information. 

Establishing purposes for reading has been shown to signifi- 
cantly influence the reading comprehension of good readers but 
not of poor readers (Helen K. Smith, 1961). Henderson 
(1965) found that fifth graders differed in their ability to for- 
mulate a purpose for reading and that this ability was positively 
related to reading comprehension scores. It should be noted, 
however, that all of the students in Henderson’s study were av- 
erage readers. 

The cloze procedure is a relatively new technique for mea- 
suring reading comprehension. The procedure involves delet- 
ing certain words from a reading selection and then requiring 



mmmmasmmmtmimmmimmmmmmmmmm 



60 




i 



ft 



| 

i 

it 

■.f 

.'i 



u 

*3 






P 



$ 



Reading: what can be measured? 



the examinee to supply the missing word; usually every fifth 
word is deleted. Bormuth (1967) tried to establish a frame of 
reference for the interpretation of cloze scores. He adminis- 
tered a fifty-item cloze test and a 31 -item multiple-choice test 
for each of nine passages to 73 pupils in grades four and five. 
The results indicated that a score of 38 per cent correct comple- 
tions on the cloze test was equal to a comprehension score of 67 
per cent and that a cloze score of 50 per cent was equal to a 
comprehension score of 87 per cent. 

Hafner (1964), using college students as subjects, investi- 
gated the relationship of various measures to cloze scores. In 
this study, not only did cloze scores correlate positively and sig- 
nificantly with measures of intelligence, vocabulary, and infor- 
mation, they also compared favorably with standard prediction 
of course grades. 

The confusion concerning the components of reading com- 
prehension have led to several serious problems for the test con- 
sumer. Of prime importance are the problems concerning what 
reading comprehension test to select, whether that test should 
include a variety of sub-tests, whether it should be timed, and 
what format the content and language structure of the reading 
selections should follow. On the basis of the review of present 
research on measuring reading comprehension, it is probably 
best to select an untimed test which includes a variety of kinds 
of questions, but these should not be combined in any attempt 
to develop diagnostic sub-tests. The reading comprehension 
measures selected are likely to be valid if the language structure 
and content of the selection follow patterns familiar to the ex- 
aminees. In addition, the selections used in the tests should be 
of various lengths and cover a variety of topics. Also, for some 
of the selections, the examinees should be allowed to look back 
at the selection; for others, they should not. Finally, the test 
should provide specific purposes for reading. A test along the 
lines described here might well provide a useful measure of gen- 
eral reading comprehension power. 
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Validity and reliability of reading comprehension measures 

Relatively few predictive validity studies of reading compre- 
hension have appeared in the literature. In one study, Webb 
and McCall (1953) used reading comprehension scores to pre- 
dict academic performance for college freshmen. In another, 
Murphy and Davis (1949) found a negative relationship be- 
tween ability to reason in reading and academic achievement for 
college freshmen. According to Murphy and Davis, this might 
be due to the heavy emphasis placed on acquisition of factual 
material in freshmen courses. However, the procedure used in 
determining reasoning in reading was somewhat questionable: 
the vocabulary score was subtracted from the level of compre- 
hension score on a reading test and the difference was labelled 
reasoning in reading. 

Construct validity studies of reading comprehension tests 
have been more common. If reading comprehension test scores 
are valid measures of reading comprehension, then increased 
scores on these tests should be related to increased comprehen- 
sion of common reading materials. One approach to testing 
this supposition would be to compare reading comprehension 
test scores with students’ ability to comprehend a series of 
increasingly difficult selections. Peterson (1956) followed this 
procedure in developing a set of ten 100- word passages rang- 
ing in difficulty according to the Flesch formula from 5 to 95; 
six multiple-choice questions followed each selection. The high 
school seniors in the study were also administered the General 
Reading and Comprehension sub-tests of the Diagnostic Read- 
ing Test: Upper Level. A statistically significant relationship 
was found between the standardized reading test scores and the 
comprehension scores for the 100- word passages; comprehen- 
sion of the passages decreased as the Flesch reading difficulty 
scores increased. The small number of questions used by 
Peterson resulted in a very small variance on the reading com- 
prehension test and this limited the applicability and interpre- 
tation of his findings. 
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Studies comparing reading grade scores derived from a 
standardized reading test with actual reading performance on 
reading material for that grade level represent a validity index 
for the reading grade scores. Most of this kind of research has 
compared informal reading test performance to standardized 
reading test performance (McCracken, 1962; Sipay, 1964; 
Glaser, 1964). In general, the research points to the conclu- 
sion that the reading grade scores from standardized reading 
tests are approximately two grades higher than performance on 
individually administered informal tests. Michaelis and Tyler 
(1951) supported this finding when they compared scores on 
the Iowa Silent Reading Tests to comprehension of social stud- 
ies material. The mean standard score on the Iowa test was 
174, which corresponded to a grade equivalent of 13.0. The 
social studies materials, which were designed for use in high 
school classes, averaged about grade-thirteen difficulty accord- 
ing to the Lorge, Flesch, and Dale-Chall readability formulas. 
A total of 69 questions followed the selections; the mean per- 
centage correct was only 62 per cent. The difficulty of the com- 
prehension items was a significant factor in Michaelis and 
Tyler’s study. Despite this finding, there is considerable doubt 
about the usefulness of the grade scores of standardized 
reading tests for determining students’ functional reading levels. 

Another attempt to understand the composition of reading 
comprehension test scores was undertaken by O’Donnell 
(1963). O’Donnell hypothesized that reading comprehension 
scores would be more highly related to awareness of structural 
relationships of words in sentences than it would be to the abil- 
ity to verbalize grammatical rules and . terminology. The two 
tests used by O’Donnell to measure these abilities included a 
specially designed test of recognition of structural relationships 
in English and the Iowa Grammar Information Test. The find- 
ings support the hypothesis that the two variables (awareness of 
structural relationships and ability to verbalize rules and termi- 
nology) correlated about equally with reading comprehension 
scores (.44 and .46, respectively). These correlations, how- 
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ever, indicate that only about 19 per cent of the variance in the 
reading comprehension scores can be accounted for by the two 
grammar tests. Thus, it would appear that knowledge and 
awareness of grammatical structures are not major factors con- 
tributing to reading comprehension. On the other hand, Jen- 
kinson (1957), by employing the cloze procedure, found that 
good comprehenders (those with high cloze scores) had a better 
understanding of language structure. The differences between 
the Jenkinson and O’Donnell findings can probably be accounted 
for by the two different methods of measuring reading compre- 
hension used. It is possible that while the ability to respond to 
multiple-choice factual recall questions is not related to aware- 
ness of language structure, they become related when compre- 
hension is based on ability to supply missing words in text. 
This ability is probaby based on a combination of skills involv- 
ing the use of semantic and syntactic clues. 

The validity of reading comprehension scores as indicators 
of the amount of knowledge retained has also been subjected to 
study. Most reading comprehension tests are given immedi- 
ately following the reading of a selection and it is entirely possi- 
ble that this comprehension does not result in retention. 
Sharpe (1952) found that comprehension tests given immedi- 
ately after reading a selection and also at 1, 7, 14, 21, 28, and 
56 day intervals showed a similar forgetting curve to classroom 
learned material: there was a gradual process of forgetting and 
not an abrupt falling away as might be expected. Sharpe’s 
study seemed to indicate that reading comprehension tests are 
usually measures of the amount of material learned and do not 
represent a unique behavior syndrome. 

Most standardized reading comprehension tests report split- 
half correlation studies as reliability evidence. While this evi- 
dence has some value, it should be noted that the effects of test 
format (i.e., both halves of the test contain the same type of 
items) probably increases this reliability index (Kerlinger, 
1965). The manuals of many standardized reading tests also 
fail to report sub-test reliabilities. Perhaps this is because 
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many of the sub-tests are too short to have adequate reliability. 
It has also been hypothesized by Sharpe (1952) that it is possi- 
ble to raise reliability indexes by allowing examinees to look 
back at the material rather than forcing them to rely more on 
memory without looking back. 



Needed research in measuring reading comprehension 

The most pressing research need in measuring comprehen- 
sion is for a clear understanding of the nature of reading com- 
prehension. At this time, there is no conclusive evidence re- 
garding the components of this skill. Is it a unitary skill or is it 
a composite of sub-skills? If it is a composite of sub-skills, can 
each of the sub-skills be measured independently? As research 
progresses, it is likely that reading comprehension will be found 
to be composed of a variety of skills; at the same time, it also 
probably will be discovered that the skills are dependent on a 
particular set of conditions. These conditions might well in- 
clude the topic of a particular selection, the purpose for reading, 
the reading difficulty of the selection in comparison to the read- 
ing skill of the examinee, the measure of comprehension uti- 
lized, and the length and language structure of the selection. 

It seems reasonable to suggest that reading comprehension 
as a global skill is non-existent and that measurement attempts 
should be narrowed down to specific conditions. For example, 
a reading comprehension test for a student reading at about the 
seventh-grade level could be developed from a 200-word sci- 
ence selection of fifth-grade readability level. The student 
could be asked to read to understand and recall specific direc- 
tions on how to conduct a scientific experiment. The reading 
comprehension test could consist of a set of multiple-choice 
questions in which the student is not allowed to look back at the 
material. By varying any of these conditions, the skill being 
measured would probably be altered. 

Another research need is the development of criterion tests 
for measuring reading comprehension. What is the reading 
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comprehension level needed for effective citizenship? What 
level of skill is needed to comprehend articles in daily newspa- 
pers? The goal of diagnostic teaching is to provide instruction 
based on individual needs and to determine progress toward 
specific goals. This implies that some goal has been defined. 
Usually standardized reading tests have been developed to com- 
pare one student’s reading performance with that of another 
rather than with some specific goal. This constitutes one of the 
major shortcomings of all such tests. 

With the exception of the cloze procedure, which was used 
in intelligence testing many years ago, there have been few new 
procedures for measuring reading comprehension in over forty 
years. If these “old” methods have been shown not to be valid, 
new ones should be tried. Perhaps, more importantly, the 
widespread notion that comprehension is a separate measurable 
sub-skill of reading should be thoroughly investigated. 



Rate of comprehension 

A review of the literature on the measurement of reading 
rate and reading comprehension reveals that most researchers 
are concerned with the degree of relation between these two 
variables. The general conclusion regarding this relationship 
which seems tenable, given the research available (Letson, 
1958; Shores & Husbands, 1950), has been summarized by 
Rankin (1962, pp. 4-5): 

In conclusion, it appears that the confounding of rate 
and comprehension in measurement is, at least in part, re- 
sponsible for some of the earlier findings that “fast readers 
are good readers.” Other studies of the relationship be- 
tween rate and “power of comprehension,” find only a 
slight relationship. When the material is more difficult, 
when more critical thought processes are involved, and 
when the reader’s purpose is more exacting, the relationship 
between reading rate and comprehension is minimal. 

Rather than a continued assault on the relationship between 
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rate and comprehension, it seems worthwhile to pursue the mea- 
surement of rate of reading comprehension as a unique skill. 
Certainly, the independent measurement of rate and compre- 
hension may have diagnostic value, but the measurement of 
the speed at which a reader comprehends a reading selection 
also has value. Several tests, Cooperative English Test: Reading 
Comprehension and the Gates-MacGinitie Reading Tests: Sur- 
vey, include measures which are combinations of reading rate 
and comprehension. 

In previous discussions of the various reading skills, a review 
of validity and reliability studies followed the presentation of 
the problems involved in measuring that particular skill. 
Because of the lack of such studies on the rate of comprehen- 
sion, this section includes only those studies related to defining 
rate of comprehension and apparent problems in its measure- 
ment. 

Rate of comprehension is a useful variable in measuring 
reading achievement. Teachers should be legitimately con- 
cerned with how fast a student can accomplish a particular read- 
ing comprehension task. What is the construct of this skill? Bus- 
well (1951) investigated whether rate of silent reading 
(speed) varied directly with rate of thinking (comprehension); 
if it did, schools might provide special reading instruction for 
slow readers who are fast thinkers. Buswell did find a positive 
relationship between rate of thinking and rate of reading. 
However, his population — 77 college seniors at the University 
of California — seriously limits generalizations from the study. 

The concept of rate of comprehension is very closely related 
to that of reading flexibility. In measuring rate of comprehen- 
sion, what the teacher needs to know is how fast a reader 
achieves his purpose, i.e., how quickly he understands the selec- 
tion (McDonald, 1965; Sheldon & Carrillo, 1952). The teach- 
er does not have to know that a reader can pass over words at 
300, 800, or 1200 words per minute; what he needs to know is 
how long it takes the reader to comprehend the material for a 
given purpose. The purpose is, of course, very important. If a 
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student is asked to determine the general content of a selection, 
he would be expected to read at a rate different from that which 
he would use if asked to read to determine the specific causes 
leading up to a certain event. Whenever a test utilizes specific 
purposes for reading, the examiner should be aware that the 
test’s purpose is always modified by the reader’s purpose. 

According to McDonald (1958), most research has failed to 
reveal that readers tend to change their reading rate to satisfy 
particular purposes unless special instruction is provided to this 
effect. The most meaningful measures of reading flexibility 
should include comprehension and speed when reading for a 
particular purpose. Sheldon (1955) found that college stu- 
dents who had been identified as good readers varied their read- 
ing speed considerably depending on the type of material read. 
Their comprehension scores were also uniformly high. The 
poor readers, on the other hand, had a very uniform (about 300 
words per minute) reading rate regardless of the type of mate- 
rial read or purposes given for reading, while their comprehen- 
sion varied greatly. 

The difficulty level of the material to be read is a limiting 
factor in measuring rate of comprehension. Hill (1964) found 
that purpose for reading had little influence on reading rate and 
comprehension when college students were asked to read for 
one of three different purposes: 1] to read a particular selec- 
tion as a course assignment over which the reader was to be 
tested the following day; 2] to read the selection to identify its 
main ideas; and 3] to read the selection to analyze critically 
the motives and attitudes of the author. The reading selections, 
which Hill stated dealt with relatively complex concepts, were 
used as experimental tests. They were written for the well- 
educated adult and presented organizational patterns and author 
attitude in a definite but subtle manner. It is quite possible that 
the complexity of the reading material prevented any reader 
flexibility. 

A somewhat contradictory finding was reported by Letson 
(1958) who studied the relationship of reading speed and com- 
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prehension on easy and difficult reading material for college 
freshmen. Letson’s results indicated: 1] the relationship be- 
tween speed scores on difficult and easy materials was high; 
2] the relationship between comprehension scores on difficult 
and easy materials was moderate; and 3] the relationship be- 
tween speed and comprehension scores was high for easy mate- 
rial, but decreased as the difficulty of the material increased. 
However, the readers in the study tended to maintain a reading 
rate independent of the difficulty of the material or the purpose 
for reading. 

A major problem in the measurement of rate of comprehen- 
sion resides in the fact that such measures are bound to be con- 
founded by artifacts resulting from the measurement proce- 
dures. Letson noted that when speed and comprehension are 
measured simultaneously, the resulting score includes the time 
taken to read the selection, to read the question, and to look 
back and re-read the text — perhaps several times. Letson sug- 
gested that such a measure would be a speed of working, rather 
than a speed of reading score. 

Another measurement problem is the effect of interrupting 
students during their reading when the examiner is attempting 
to determine rate of comprehension. The Nelson-Denny Read- 
ing Test, for example, asks students to note how far they have 
read after reading a selection for one minute; students then go 
on to complete reading the selection and answer the compre- 
hension questions which follow. The Gates-MacGinitie Read- 
ing Tests: Survey measure rate by a modified cloze procedure. 
This interrupts the student’s reading to the extent that he must 
consider the correct alternative to fill a blank in the reading text. 
There is some evidence that this interruption could affect any 
attempts to measure rate of reading comprehension. 

McDonald (1960) studied the reading rate and comprehen- 
sion of 117 college students under four timing procedures in- 
volving various amounts of interruption. Reading performance 
was significantly hampered by periodic interruptions; reading 
rate was not affected, but significant reduction in reading com- 
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prehension was noted. McDonald concluded: “Timing proce- 
dures which produce periodic interruptions during the reading 
process should be avoided. Methods of timing reading which 
minimally interrupt the students should be selected” (1960, p. 
33). 

Differential effects on reading comprehension scores, as the 
result of an interruption, have been found for slow and fast 
readers. Cook (1957) attempted to discover if time announce- 
ments during the administration of reading tests given to all en- 
tering students at the University of Iowa affected comprehen- 
sion scores. Significantly poorer comprehension was noted for 
the slower readers than for the fast readers. 

The content of the reading selections has also been shown to 
be a significant factor in the relationship between reading rate 
and comprehension. Thurstone (1944) obtained correlations 
between rate and comprehension of .11 on physical science ma- 
terial, .42 on literary material, and .44 on social science mate- 
rial. 

A further measurement problem is caused by the finding that 
students can increase their reading speed, without any loss in 
comprehension, under a set of instructions to read faster. 
Maxwell (1965, p. 186) supported this hypothesis in a study in- 
volving 104 college students, concluding that the “ . . . study 
has shown that instructing students to read faster on a standard- 
ized test results in a significantly faster reading rate, and further 
suggests that reading test speed increases as a function of a 
warm-up period.” 

Fricke (1957) studied the results of the Cooperative English 
Test: Reading Comprehension to determine if speed of reading 
scores and level of reading scores could be replaced by two new 
scores: rate and accuracy. Both the speed and accuracy scores 
were rate of comprehension scores; however, the speed score 
suggested by the manual for the Cooperative test was the num- 
ber of correct answers less one quarter of the wrong answers. 
Fricke stated that this score does not validly measure the rate of 
comprehension of the fast but careless reader. He suggested 
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that the rate score (speed of comprehension) should be simply 
the number of correct responses. 

Research on the measurement of rate of reading comprehen- 
sion must first focus on the utility of such a measure. There is 
no research evidence that has been found which relates this 
score to the objectives of the reading instructional program. 
How can such a score be utilized? Is the knowledge of student 
performance on this measure of educational value to the class- 
room teacher? Is the improvement of speed of comprehension 
when it is related to specific purposes an important objective of 
reading programs? Future research should certainly focus on 
these questions. 

Specific research in measuring rate of comprehension should 
investigate more carefully the effects of the difficulty of mate- 
rial, the interest level of the selections, readers’ purposes, and 
the effects of certain timing and scoring procedures. Almost all 
of the studies which have begun to examine these variables have 
used college students. Much work needs to be done at younger 
age levels. If flexible reading patterns are important for col- 
lege readers, then they would also seem to be important for ele- 
mentary and high school students. Perhaps future research will 
conclude that there is no general rate of reading comprehension; 
instead, it might well prove that for each reader there are a 
number of reading rates dependent upon some of the previously 
mentioned variables such as purpose and difficulty of materials. 
If this should be the case and if there is general agreement that 
improvement in rate of comprehension is an important objective 
of the reading program, a variety of tests for use in differing 
class situations need to be developed and/ or teachers need to be 
trained to assess this skill informally in each learning situation. 



What can be measured 

From a review of the previous studies, it is quite clear that 
the measurement of reading behavior is based on logical rather 
than empirical evidence. Research studies regarding the meas- 
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urement of sub-skills of reading are very limited. Where there 
have been studies, there is more negative than positive evidence 
to support existing measures of the sub-skills of reading. In ad- 
dition, for the common sub-test of reading behavior, there is 
great confusion concerning the most appropriate method of 
measurement. There are far more procedures utilized for meas- 
uring any single sub-skill of reading than there are hypothe- 
sized sub-skills of the total reading act. Many studies also con- 
clude that the tests of reading skills fail to measure the more im- 
portant aspects of the skill but focus instead on the superficial. 

Researchers in this area have also voiced a fairly consistent 
plea that teachers employ more specific measures of reading 
ability. This means that teachers need to more carefully de- 
fine their teaching objectives and then select or construct a test 
which matches those program objectives. This procedure 
would automatically increase the validity of the test. 

Now that it has been established that many tests fail to meas- 
ure validly what they purport to measure, that no one seems to 
know whether sub-skills of reading can be measured, and that 
there is a lack of measures for assessing more complex reading 
behaviors, it seems appropriate to focus on research on proce- 
dures for assessing students’ reading abilities. 
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While there is no question that standardized, formal, and in- 
formal reading tests supply information, there is little agreement 
among reading specialists as to the nature of this information 
and the valu? it has for the classroom teacher. Standardized 
tests obviously indicate how a student performs in relation to 
other students at one point in time, but] he v rarely accoun t for 
why the student performs as he does. As mentioned earlier, 
many factors can influence student performance. These factors 
range from the student’s experiential background to the interest 
appeal of selections included on any given test. They can both 
limit the accuracy of testing devices and their usefulness as valid 
indicators of student performance. Because tests merely de- 
scribe reading behavior rather than explain it, it may well be the 
case that tests can supply only limited information as a basis for 
classroom instruction. However, it is assumed for the purposes 
of the present chapter that knowing how a student performs is 
information enough to allow the teacher to proceed with in- 
struction. Thus, it is legitimate to raise the question of whether 
testing instruments actually provide a true estimate of student 
achievement. Research has dealt with this problem in one form 
or another many times. A good example of this is Glaser’s 
(1964) study in which the performance of a group of third 
graders was compared with that of seventh graders on the Gates 
Reading Survey. While the two groups scored the same on the 
Gates survey (between 5.0 and 5.9), their performance on an 
informal reading inventory differed considerably. This was due 
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to the fact that on the Gates, assessment of skills takes place by 
comparing an individual’s performance with other students’ per- 
formance; on the informal inventory, how well a student reads is 
determined by his performance on a set of criteria tasks. If 
anything, the Glaser study underlines the dangers inherent in 
depending on any one measure as an indicator of student per- 
formance. Therefore, if one seeks to get an accurate assessment 
of student achievement, it is advisable to use a wide variety of 
reading measures including informal inventories, standardized 
tests, teacher observations, and teacher assessment of perform- 
ance in content areas. The validity of using a variety of 
measures has been well substantiated in the research literature. 
For instance, Croft (1951) used intelligence tests, medical his- 
tories, arithmetic and reading achievement tests, social adjust- 
ment and interest inventories, sociograms, and social back- 
ground data to assess the achievement of a group of students. 
He found that this particular combination of measures provided 
a more useful and accurate basis for planning instructional pro- 
grams than any one of the measures did when used singly. 

The previous paragraph should give the reader a general 
idea of some of the problems inherent in using testing devices. 
The following pages expand these ideas and provide some sug- 
gestions for test usage. The focal point of this chapter is not on 
the philosophical aspects of testing, rather it is on the viability of 
two kinds of devices commonly used in evaluating reading per- 
formance: standardized diagnostic tests and informal reading in- 
ventories. 



Standardized tests 

The teaching of reading, if it is to be effective, should be 
based on a thorough knowledge of the reading strengths and 
weaknesses of students. The central issues discussed in this 
section are whether standardized tests do provide such informa- 
tion accurately and reliably. The most serious deficiency in 
using standardized tests to diagnose reading achievement is the 
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lack of discriminant validity (the validity of tests as measures of 
distinct skills or abilities) for the various sub-tests of reading. 
This problem was discussed in some length in the preceding 
chapter; without a doubt, it constitutes a major shortcoming of 
most standardized reading tests. If a teacher is planning an in- 
structional program geared to improve those specific reading 
skills which a test has shown to be weak, the teacher has to be 
reasonably confident that the tests he has used validly measure 
those skills. Most research on measuring specific reading skills 
has been either too limited or too equivocal to support the logi- 
cal contention that specific sub-skills of reading can be validly 
measured. For instance, Hunt (1957) and Farr (1968) both 
questioned the diagnostic validity of sub-tests of reading. The 
lack of such diagnostic validity was attributed by Goodman 
( 1968) to a lack of understanding of the reading process. 

The discussion which follows focuses primarily on reading 
and psychological tests which have been used for diagnostic 
purposes and on the reliability and validity of group and indi- 
vidual tests. A separate sub-section on the Wechsler Intelli- 
gence Scale for Children has been included because of numerous 
studies devoted to its use as a diagnostic tool. 

Group tests 

This monograph has already discussed the limited validity of 
group tests and sub-tests of reading skills. What has yet to be 
explored is the use of group tests as diagnostic tools and the 
possibility of improving these tests so that they accurately assess 
reading ability. 

Davis (1961, p. 86) outlined four steps which should be ad- 
hered to if standardized group tests are used: 

1] Carefully and explicitly define the variable being 
measured. 

2] Administer a test of the variable, or as close an ap- 
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proximation as possible, under conditions that assure a 
high degree of cooperation on the pairt of the pupil. 

3] Compare a pupil’s obtained score with suitable 
norms, such as percentile ranks in his own age or 
grade group. 

4] Consider the possibility that the pupil’s obtained 
score represents a sizeable deviation from his true 
score. 
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These four steps emphasize the importance of matching instruc- 
tional objectives to test objectives if the test is to validly meas- 
ure specific reading skills. Hills (1964) has elaborated on this 
point and has offered ten questions that should be fully an- 
swered before any particular measure of skills is selected for 
use. 

1 ] Is the test appropriate for the consumer’s purposes? 

2] What does the test purport to measure? 

3] What do reviewers think it measures? 

4] What are the item content and style? 

5] Is the test a test of speed or of power? 

6] Does the test contain a correction for guessing? 

7] Does the structure of the items provide clues to the 
answers? 

8] Are there alternate forms? How well are they 
matched? 

9] What are the norm groups (kind, quality, characteris- 
tics)? 

10] Is the range wide enough (is there enough top and 
bottom)? 

Even if a test consumer carefully adheres to both Davis’ and 
Hills’ guidelines, the tests he selects still may not validly assess 
reading achievement. Several studies have cast doubt on the 
diagnostic validity of any group reading test. Murray and Karl- 
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sen (1960) conducted a concurrent validity study of the Gates 
Reading Diagnostic Tests, an individually administered test, and 
the Developmental Reading Tests: Silent Reading Diagnostic 
Test, a group test. They found no agreement between the sub- 
tests of these two tests. The ten sub-tests Murray and Karlsen 
compared were perception of words in isolation, orientation re- 
versals, initial errors, middle errors, ending errors, word ele- 
ments, letter sounds, beginning sounds, rhyming sounds, and 
word synthesis. It is very difficult to interpret this study be- 
cause of the very small sample size (only twenty students). In 
addition, the use of grade level scores when examining differ- 
ences between mean scores is of questionable validity. Such a 
procedure is valid only if the two tests are normed on the same 
populations and this is very unlikely in the Murray and Karlsen 
study. 

Chall (1958) used two procedures in an attempt to validate 
the Roswell-Chall Diagnostic Reading Test of Word Analysis 
Skills. The first procedure involved comparing scores on the 
word analysis test with various criterion tests for three different 
populations: second graders, fifth graders, and a reading clinic 
group from various elementary grades. Second graders were 
administered the Gray Oral Reading Test, a silent reading test, 
and the spelling sub-test of the Metropolitan Achievement 
Tests. The clinic population took the same spelling and oral 
reading tests as did the second graders; however, the silent 
reading test they were administered was taken from the Metro- 
politan. The fifth-grade population in the study was admin- 
istered only the oral reading test. All these measures seem 
to be questionable criteria for validating a word analysis test. 
The second graders’ scores all correlated at a high level with the 
three criteria tests, but this may have been due to the fact that 
the second graders probably scored at the top of the scale of the 
word analysis test and there was, therefore, very little variability 
in their scores. The clinic population, whose scores on the 
word analysis test would be much more variable, had much 
lower correlations with the three tests: .73 with oral reading, 
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.64 with silent reading, and .57 with spelling. These correla- 
tions indicate that in all three cases less than fifty per cent of the 
variance was shared in common by the two tests — this is not 
very conclusive validity evidence. Chall’s second procedure in- 
volved investigating the number of errors on each of the sub- 
tests of the word analysis test according to eight reading grade 
levels. The sub-test scores gave a very impressive picture of a 
decrease in the number of errors from the first through the 
eighth grade, thereby indicating a mastery of these skills as stu- 
dents progress through school. 

The evidence from these studies as well as from those stud- 
ies cited in Chapter 2 are consistent in pointing out that group 
tests of reading achievement are quite limited as valid meas- 
ures of sub-skills of reading. Indeed, there does not seem to 
be any degree of consistency between or among test publishers 
and researchers about what these group tests actually measure. 
Yet there are valid uses for group tests in the instructional read- 
ing program. First of all, the tests are reliable for comparing 
students in terms of general reading achievement. Secondly, 
the tests can be used as a screening device in determining the 
need for and possibly the type of further assessment necessary. 
For example, a student who performs very poorly on a group 
test designed for intermediate grade children would probably 
need a more detailed assessment of his word attack skills, while 
a student who performs quite well perhaps needs more detailed 
evaluation of his ability to use sources of information more ef- 
fectively. Individual reading tests and informal testing proce- 
dures both the subject of subsequent parts of this chapter 

are valuable procedures for continuing the diagnosis which is 
barely started by group standardized tests. 

Individual tests 

For the purposes of this section, individual tests are defined 
as those tests which can be administered to only one examinee 
at a time. A variety of such individual tests have been used in 
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an attempt to diagnose the reading achievement of students and 
there has been considerable amounts of research devoted to as- 
sessing their validity and reliability. One such study, conducted 
by Sheldon and Hatch (1950, 1951) at Syracuse University, fo- 
cused on the Durrell Analysis of Reading Difficulty. In the ini- 
tial study (Sheldon & Hatch, 1950), third-grade students served 
as subjects. The criterion used for determining good and poor 
readers were reading achievement tests and teacher ratings of 
reading ability. The lowest and the highest five per cent of the 
readers who had intelligence scores above 90 were administered 
the Durrell test. Results indicated that the Durrell test was a 
valid measure of reading achievement. Because the mean 
scores of the intelligence tests for the good and poor readers 
were significantly different, it is possible that the Durrell test was 
merely comparing intelligence levels and not reading levels. In 
a similar study with sixth-grade children, Sheldon and Hatch 
(1951) obtained almost identical results. While neither of the 
two investigations were undertaken primarily as validity studies 
of the Durrell test, they did provide substantial evidence that 
the Durrell Analysis of Reading Difficulty is a valid measure- 
ment device for diagnosing reading difficulties when the crite- 
rion used is teacher ratings and specific standardized reading 
test scores. However, neither of these studies supplied any va- 
lidity evidence for the Durrell test as a diagnostic measure of 
specific reading sub-skills. 

A concurrent validity study of three of the most popular in- 
dividual reading tests — the Durrell Analysis of Reading Diffi- 
culty, the Diagnostic Reading Scales, and the Gates-McKillop 
Reading Diagnostic Test— was undertaken by Eller and Attea 
(1966). They found that the word analysis and oral reading 
sub-tests of the three tests were highly correlated. It also would 
have been quite interesting if correlations across sub-tests had 
been supplied by the researchers so that it would be possible to 
determine if the correlations had indicated any discriminant 
validity. Despite the high correlations between these tests, there 
were significant differences between the grade level scores for 
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the oral reading sub-tests. The Durrell test was, on the aver- 
age, about half a grade level lower than the Diagnostic Reading 
Scales and about a third of a grade level lower than the Gates- 
McKillop test. Eller and Attea used multiple t tests to study all 
of these differences. Unfortunately, this left the significance of 
their findings open to question. 

Some very interesting discrepancies in the scaling properties 
of a few of the sub-tests of the Durrell, the Diagnostic scales, 
and Gates-McKillop tests were found in the Eller and Attea 
study. Particularly striking are the weaknesses found in the lis- 
tening comprehension sub-test of the Durrell and the silent 
reading sub-test of the Diagnostic scales. The Durrell listening 
comprehension test is composed of a series of paragraphs of in- 
creasing difficulty which are read orally to an examinee after 
which he is asked a set of questions. This would seem to be a 
useful procedure for estimating reading potential. However, in 
Eller and Attea’s study, 99 per cent of the third-grade pupils 
passed the third-grade paragraph test, only 10 per cent passed 
the fourth-grade test, while 32 per cent of the same students 
passed the fifth-grade test. In determining silent reading level 
on the Diagnostic scales, the examinee is required to read orally 
a set of paragraphs of increasing difficulty until a certain num- 
ber of errors are made. The examinee then continues reading 
on the next level silently. The assumption on which this proce- 
dure is based is that silent reading is always more highly devel- 
oped than oral reading. Yet, Eller and Attea were not able to 
determine silent reading scores for 47 per cent of the third grad- 
ers because their silent reading achievement was not, according 
to the test, as well developed as their oral reading achievement. 

Oral reading tests have been used extensively to diagnose 
reading in general. This is especially evident when one realizes 
that most standardized individual reading tests include oral 
reading sub-tests. For example, the Durrell Analysis of Read- 
ing Difficulty, the Gates-McKillop Reading Diagnostic Test, and 
the Diagnostic Reading Scales all include sub-tests of oral read- 
ing. The use of oral reading tests to diagnose reading achieve- 
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ment is based on the assumption that there is a high correlation 
between silent and oral reading, as some researchers (Fair- 
banks, 1937; Gilmore, 1947) have found, and that the use of 
oral reading tests as measures of silent reading achievement is, 
therefore, justifiable. This assumption has been criticized by 
Buswell (1947) on the grounds that oral reading involves dif- 
ferent skills than silent reading. 

The use of oral reading tests as a measure of silent reading 
achievement above the elementary grades has also been ques- 
tioned by Gray and Reese (1957). Gray and Reese cogently 
pointed out that silent reading achievement surpasses oral read- 
ing achievement, at least in terms of rate, after children reach a 
second-grade reading level. Wells (1950) studied oral reading 
tests on the college level. He sought to determine whether the 
analysis of oral reading errors would correlate with silent read- 
ing achievement for college freshmen of low academic ability. 
Non-significant correlations were found between oral reading 
mispronunciations and tests of silent reading comprehension 
and vocabulary. This alone would cast considerable doubt on 
the value of an oral reading test for diagnosing silent reading 
achievement with more mature readers. In referring to data 
presented by Gilmore (1947), Wells suggested that the progres- 
sively lower correlations found between oral and silent reading 
as the higher grade levels are reached indicates an increasing 
tendency on the part of each of the two reading skills to become 
independent. 

A second problem in using oral reading performance as a 
basis for assessing silent reading involves the significance of cat- 
egorizing certain word call errors. Weber (1968) studied the 
classification systems of a number of researchers in the area of 
oral reading and those of a number of oral reading tests and 
concluded that these systems were based, for the most part, on 
whole word errors. Weber emphasized that the two most seri- 
ous weaknesses of these tests were their treatment of word errors 
as isolated sets of letters rather than as part of a sentence. In 
addition, Weber found that the standardization of the tests re- 
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lied on the total number of errors made, rather than the type of 
errors made. This would lead to a situation where a student 
who made five minor mispronunciations which did not inter- 
fere with his understanding of a selection would be grouped 
with a student who made five gross mispronunciations and failed 
to understand most of the selection. 

Contradictory evidence, however, has been reported by 
Spache (1950) in a study in which he attempted to compare the 
various norms given for the oral reading test of the Diagnostic 
Reading Scales, the Oral Reading — Unaided Oral Recall Test 
of the Durrell Analysis of Reading Difficulty, and Gray’s Oral 
Check Test. Under certain specified test procedures, Spache 
found the tests to be quite comparable. This finding indicates 
that it is possible to use tests, or at least these three particular 
tests, in various combinations to determine reading improve- 
ment during the course of a remedial program. The major dif- 
ference between Weber’s conclusion and Spache’s is that Weber 
was concerned with the diagnostic validity of recording oral 
word call errors while Spache was attempting to compare stu- 
dents’ general performance on several oral reading tests. 



The relation between psychological test scores and reading 

A number of psychological tests have been used to diagnose 
reading achievement. Their validity for diagnosing reading dis- 
ability has been questioned. V alidity studies of this nature have 
focused on the Stanford-Binet Intelligence Scale, Marianne 
Frostig Developmental Test of Visual Perception, the Bender- 
Gestalt, the Wepman Test of Auditory Discrimination, and the 
Rorschach Test. 

A study of the auditory memory span sub-test of the 
Stanford-Binet Intelligence Scale, Form-L was undertaken by 
Rose (1958) to determine its diagnostic potential for poor 
readers. Rose found a strong relation between poor reading in 
general and poor performance on this test. This, by the way, 
was consistent with the findings of the WISC studies cited later! 
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However, Rose cautioned against thinking that all poor readers 
are deficient in auditory memory span and, therefore, need in- 
struction in improving it. Rose reached this conclusion because 
one-third of the remedial readers in the study did not have any 
more difficulty on the auditory memory span sub-test than did 
the students of average reading achievement. 

Contrary to Rose’s findings, Bond and Fay (1950) discov- 
ered that poor readers, when compared to good readers, tended 
to perform better on memory items on the Stanford-Binet. 
They "also cautioned against using their finding for diagnostic 
purposes because there was a lack of consistently superior per- 
formance on these memory items by the poor readers. 

The Marianne Frostig Developmental Test of Visual Per- 
ception was found to be of little value in predicting the reading 
achievement of second-grade students (Olson, 1966). The low 
correlation reported by Olson between the scores on the Frostig 
test and chronological and mental age should also raise some 
questions concerning the validity of the test for determining per- 
ceptual development. 

A validity study carried out by Krippner (1966) of the Min- 
nesota Percepto-Diagnostic Test resulted in quite different find- 
ings. In the Child Study Center at Kent State University, the 
diagnostic categorization of the Minnesota test was compared to 
the diagnostic findings of reading clinicians. The diagnostic 
categories were based on the major etiological factors behind 
reading disability according to Rabinovitch (1959). The three 
categories were: organic, primary, and secondary retardation. 
In 24 reading clinic cases, the Minnesota Percepto-Diagnostic 
Test and the reading clinicians’ diagnoses agreed in all but two 
cases. This finding is amazing when it is realized that the read- 
ing clinicians were graduate students in training and the three 
diagnostic categories were not defined in specific behavioral 
terms. This report of highly positive concurrent validity would 
have been more useful if the tests or procedures used by the 
reading clinicians were described more fully. 

Contradictory findings concerning the use of the Bender- 
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Gestalt as a diagnostic instrument of reading achievement were 
reported in two recent studies. Keogh (1965), in a longitudi- 
nal study of 127 children, evaluated the use of the Bender- 
Gestalt at the kindergarten level as a predictive measure of 
reading achievement and at the third-grade level as a diagnostic 
test of reading performance. The third-grade criterion measures 
of reading were teacher ratings and the California Reading Test. 
Keogh concluded that the Bender-Gestalt was useful in identify- 
ing potentially good readers but was of limited value as a diag- 
nostic test of reading difficulty. 

The second recent study on the Bender-Gestalt was carried 
out by Parrish (1962). Parrish administered the Bender- 
Gestalt to a group of first-grade male readers and a group of 
first-grade male non-readers of average intelligence. The results 
indicated that the two groups did not differ significantly in copy- 
ing or in discrimination on the perceptual phase of the test. 
Significant differences were found, however, in the reproduction 
of the Bender designs. This was attributed to interpretive fac- 
tors. Parrish concluded that the clinical utility of the Bender- 
Gestalt with young children was confirmed and the test seemed 
capable of discriminating between reader and non-reader first- 
grade boys. 

The use of the Rorschach Test as a diagnostic tool for ana- 
lyzing reading behavior was conducted by Knoblock (1965) 
who administered it to 62 second-grade children. The children 
were divided into good (upper quartile) and poor (lower quar- 
tile) readers on the basis of their Gates Advanced Primary 
Reading Test scores. In general, Knoblock found that the Ror- 
schach Test failed to discriminate between the good and poor 
readers. As a result, he rejected the hypothesis that good read- 
ers generally function at a more mature level on all psychologi- 
cal measures. 

The relation between WISC scores and reading Many recent studies 
have attempted to use the sub-test patterns of the Wechsler In- 
telligence Scale for Children to identify and diagnose students 
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with reading disability. Deal (1965) reviewed some of these 
studies (Altus, 1956; Burks & Bruce, 1955; Coleman & Rasof, 
1963; Graham, 1953; Hirst, 1960; Kallos & Grabow, 1961; 
I Muir, 1962; Neville, 1961 ; Paterra, 1963). In order to compare 

| these studies, Table 3 has been devised. The table also contains 

additional studies on the WISC not included in Deal’s review 
| (Richardson & Surko, 1956; Sheldon & Garton, 1959; Dockrell, 

| 196 °; Robeck, 1960; McClean, 1968; McCleod, 1965; Reid & 

| Schoer, 1966). 

A number of problems should be borne in mind as the stud- 
ies are compared : 

1] Many of the studies used quite small and restricted 
populations. 

2] The criteria used to determine reading retardation 
differed considerably from study to study. 

3] Subjects’ age ranges varied in some studies, while 
they were quite restrictive in others. 

4] The intelligence scores of the subjects in some of the 
studies were controlled so that some of the compari- 
sons were between high and low scores on intelligence 
tests rather than between good and poor readers of 
similar intelligence levels. 

5] A number of the studies included only males. 

6] The criteria for determining the significance of high 
and low scores on the sub-tests of the WISC were not 
similar in each study. 

7] Several studies only compared verbal to performance 
scores for good and poor readers and were not con- 
cerned with WISC sub-test patterns. 

Despite these limitations, it appears from Table 3 that fairly 
consistent patterns of WISC scores for retarded readers are dis- 
cernible from the sixteen studies included. Inspection of the 
patterns of scores indicates that poor readers perform at a lower 
level than they score on the rest of the WISC on the following 
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Table 3 The sub-test patterns of WISC scores for retarded readers 
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sub-tests: Information, Arithmetic, Digit Span, Coding, and 
sometimes Vocabulary. The poor readers usually, perform bet- 
ter on the following sub-tests than on the rest of tlie WISC bat- 
tery: Comprehension, Picture Completion, Block Design, Pic- 
ture Arrangement, and Object Assembly. 

Several of these researchers (Paterra, 1963; Burks & Bruce, 
1955; Hirst, 1960) suggested that the WISC sub-test patterns 
could be employed to determine the type of remedial reading 
program needed by these students; but not one of the studies 
undertook to validate this suggestion. Future research could 
use at least two approaches to interpreting these sub-test pat- 
terns. The first would be to relate the good and poor perform- 
ances on the WISC tests to the lack of opportunity to learn 
which was caused by the inability to read. For example, Infor- 
mation, Arithmetic, and Vocabulary sub-tests are probably most 
affected by the broadening of knowledge through reading; how- 
ever, this analysis does not account for the poor performance on 
Digit Span and Coding. The sub-tests on which the poor read- 
ers scored relatively high would probably not be increased by 
further experience through reading. Such an analysis of the 
WISC patterns would hypothesize that poor readers are not de- 
ficient in particular abilities caused by the reading disability, but 
rather, that poor reading skill has prevented the student from de- 
veloping in certain areas. 

A second path for future research might involve relating the 
WISC performance patterns to perceptual-motor development. 
Performance on the Arithmetic, Digit Span, and Coding appear 
to rely on auditory and visual discrimination as well as immedi- 
ate memory span. These same skills also seem to be highly re- 
lated to learning to read. While some perceptual-motor skills 
are required on the Picture Completion, Block Design, and Ob- 
ject Assembly, these sub-tests tend to be more gross in nature 
and more closely related to concrete objects in contrast to the 
more abstract symbols and numbers of Arithmetic, Digit Span, 
and Coding. 

One last problem should be mentioned in analyzing WISC 
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sub-test patterns: researchers have limited their studies to pu- 
pils within the average range of mental ability. If it is accepted 
that poor readers are at a serious disadvantage on WISC sub- 
tests such as Vocabulary, Information, and Arithmetic, it logi- 
cally follows that if such students are to attain average intelli- 
gence test scores, they must perform at a higher level on several 
of the remaining tests. It may be that some of these studies 
are, therefore, not comparing good and poor readers of average 
intelligence, rather, they are comparing bright students who are 
poor readers with average students who are average readers. 
The sub-test patterns of the WISC have been shown to be re- 
lated to reading retardation. What is now needed are studies 
which attempt to relate this test performance to instructional 
programs in order to investigate the validity of these scores for 
planning effective remediation. 

A note on the use of standardized tests 

From the preceding review of research on tests diagnosing 
reading ability, several key problems are apparent. First, there 
is no consistent definition of the sub-skills constituting reading 
on present standardized tests, thereby leading to confusion con- 
cerning their discriminant validity. This confusion has filtered 
down to the classroom where teachers have been left in a quan- 
dary about how to proceed with instruction. Although availa- 
ble diagnostic tests seem to be quite limited, teachers can still 
plan effective reading programs which meet the needs of their 
students. This has been the case and will continue to be the 
case as long as the practitioner is aware of the limitations of the 
various diagnostic tests and realizes that the tests probably at 
best represent an obstacle course for the students. The best di- 
agnosis takes place when the teacher brings “enough sophistica- 
tion to the test sessions to evaluate pupils’ reading abilities and 
weaknesses as they succeed or fail” on the various test items 
(Eller & Attea, 1966, p. 566). 

Adequate criterion measures of reading achievement need to 
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be delineated before diagnostic testing can be improved. 
Standardized tests usually compare a student’s performance to 
some norm groups. What is needed are tests which compare a 
student’s performance to some criterion of adequate reading. 
For example, at present, only vague notions exist about what 
“good” third-grade reading is. Until such criteria, or perhaps 
more importantly, some criteria for determining reading levels 
adequate for “effective” citizenship for adults can be devised, 
the value of diagnostic tests will continue to rely more on the 
sophistication of the reading teacher than the sophistication or 
the intrinsic value of the tests. 

Finally, it appears that the attempts to use psychological 
tests such as the WISC, the Frostig, the Bender-Gestalt, and 
the Rorschach in diagnosing reading achievement have been 
largely futile. While correlations between poor reading and 
performance on these tests have been found, the reasons for 
them have never been determined. Unless researchers begin to 
validate these correlations against remedial programs or some 
other valid criterion, attempts to use psychological tests as diag- 
nostic reading tests should be abandoned. Instead, efforts 
might best be channeled toward improving diagnostic testing 
through a more valid sampling of reading behavior rather than 
through an assessment of behaviors which are related to reading 
in some unknown manner. The test consumer can increase the 
validity of his diagnostic attempts in two ways. First, when se- 
lecting a standardized group or individual test, he can carefully 
match his teaching objectives to the test objectives. Secondly, 
he can develop informal procedures to assess students’ reading 
behaviors in the classroom situation. 

Informal measurement of reading 

Informal approaches to assessing reading achievement in- 
clude a wide range of methods such as measuring student use of 
the library, determining out-of-school reading habits, using 
teacher-made check lists of reading skills, and diagnostic evalu- 
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ations made by both the student and his parents. There have 
been very few studies which have investigated the validity and 
usefulness of most of these approaches. Research on informal 
reading assessment has focused on either comparing informal 
reading inventories with standardized reading tests (Sipay, 
1964; Patty, 1965; Williams, 1963), validating students’ self- 
evaluations (Purcell, 1963; Spaights, 1965), or comparing 
teacher judgments of students’ reading with their performance 
on standardized reading tests (Kermonian, 1962; Henig, 1949; 
Hitchcock & Alfred, 1955). 

Because informal approaches use such a wide variety of pro- 
cedures to assess reading performance over a number of differ- 
ent occasions, it is not surprising that they are more reliable and 
more valid measures than standardized reading tests. After all, 
the more behavior which is sampled, the more likely the assess- 
ment is to be accurate. However, a word of caution is needed 
on the use of informal approaches. Evaluations based on infor- 
mal means are more reliable estimates of the student’s true 
reading behavior than standardized reading tests precisely be- 
cause they are not based on the comparison of any one student 
to any other student. If a teacher wishes to compare student 
performance with that of other students, informal inventories 
are inappropriate because they evaluate each student individu- 
ally under different conditions. In this case, standardized tests 
should be used since they have consistent administrative proce- 
dures. 

However, when they are used to plan instruction, informal 
measurement procedures have more validity than standardized 
reading tests. In using informal assessments of students’ read- 
ing in daily classroom situations, the teacher can evaluate the 
students’ ability to apply their reading skills to various learning 
tasks. In this way, not only can the teacher learn about the de- 
velopment of students’ basic reading skills, but he can also learn 
about student attitudes toward reading tasks, their reading inter- 
ests, and their ability to apply their reading skills. 
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informal reading inventories 

The use of informal reading inventories (IRI’s) for deter- 
mining students’ functional reading levels and diagnosing read- 
ing skills is a fairly well established practice. For an excellent 
discussion of informal reading inventories, the reader p referred 
to Johnson and Kress’ (1966) work, Informal Reading Invento- 
ries. The inventory or IRI, as it is known is composed of a se- 
ries of graded paragraphs which are usually read aloud by the 
examinee to the examiner; comprehension checks folio • 
paragraph reading. As the student reads, the examiner keeps 
track of such errors as mispronunciation of words, unknown 
words, reversals, repetitions, substitutions, word by word read- 
ing and other word call errors. On the basis of these readings, 
the teacher determines the students’ functional reading levels,. 
Some informal reading inventories occasionally include a 1 - 
tional paragraphs to be read silently, an assessment of the size 
of a student’s sight vocabulary, a procedure for assessing ora 
language development, and other measures which have been de- 
veloped to assess aspects of the student’s reading developments 
which the teacher feels are vital to reading success. These in- 
ventories range from tests which teachers devise for use in their 
own classrooms to more standardized inventories developed for 
use in reading clinics. There are even more carefully standard- 
waH inventories which are published for sale like the Standard 



Reading Inventory. 

These informal inventories are highly regarded for their use- 
fulness in determining students’ reading levels. Johnson 
(1960) pointed out the difference between standardized tests 
and informal tests on the basis of the information they convey. 



Standardized tests rate an individual’s performance as com- 
pared to the performance of others. By contrast, an infor- 
mal inventory appraises the individual’s level of competence 
on a particular job without reference to what others do. 

(1960, p. 9) 

Johnson suggested that the classroom teacher determine appro- 
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priate levels for independent and instructional work solely 
through the use of informal reading inventories. 

Despite the accepted worth of informal reading inventories, 
there are several problems that limit their use. First of all, the 
criteria for evaluating IRI performance is quite subjective; read- 
ing specialists have suggested various criteria for evaluating 
reading performance (Betts, 1940; Sipay, 1964). Secondly, 
the performance a student exhibits is quite dependent on the 
reading selection selected for a particular IRI. For example, a 
story may be selected from a third-grade reader for inclusion on 
an IRI because it is supposed to represent third-grade reading 
difficulty. However, the reading difficulty of any short selection 
taken from a basal reader may be quite different from the read- 
ing level it is supposed to represent. A third problem in using 
IRI’s relates to the examiner’s knowledge of the basic reading 
process and his ability to record errors and make judgments 
about reading performance. 

Research concerned with informal reading inventories has 
focused primarily on the relation between IRI’s and standard- 
ized reading tests as well as the usefulness of the IRI as a diag- 
nostic tool. In particular, research has concentrated on assess- 
ing the accuracy of informal versus standardized testing proce- 
dures in determining an individual’s reading level. Betts 
(1940) attempted to study the accuracy of standardized as 
compared to informal procedures for assessing reading grade 
placement. He administered five silent reading tests — the 
Gates Reading Survey, the Stanford Achievement Test: Read- 
ing, the Durrell-Sullivan Reading Achievement Test, the 
Sangren-Woody Reading Test, and the Iowa Silent Reading 
Tests. Advanced to fifth graders and compared their perform-* 
ance on them with that on an author-constructed informal read- 
ing inventory. Betts offered the following conclusions: 

1] The results from one test are not highly comparable 
with the results secured from another test. 
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2] None of the standardized reading tests used provide 
an accurate index to the levels at which reading in- 
struction should be initiated for the low achievers. 
For example, 1 1 per cent of the fifth graders experi- 
enced difficulty in typical third-grade reading activi- 
ties, but only one of the standardized tests used 
placed these pupils below the third-grade level. The 
tests did identify the low achievers whose reading 
difficulties needed further analysis. 

Another early study comparing performance on a standard- 
ized reading test — the Gates Reading Survey — with that on an 
informal inventory was undertaken by Killgallon (1942) with a 
group of 211 fourth graders. The various functional reading 
levels determined by Killgallon’s reading inventory are pre- 
sented in Table 4. Among other things, Killgallon found that 
the IRI for his group of fourth-grade children yielded three 
functional reading levels : 

the mean Independent Reading grade level was .86, 
the range was 0 to 5.0; 

the mean Instructional Reading grade level was 3.16, 
the range was 0 to 9.0; 

the mean Frustration Reading grade level was 6.3, 
the range was 4.0 to 9.0. 

In addition, Killgallon found that the reading ability of the 211 
students in the original sample on the Gates Reading Survey 
ranged from 2.0 to 10.4, while the mean was 4.6. Killgallon 
pointed out that, on the average, pupils tend to score about one 
year higher on the standardized reading test than their instruc- 
tional level determined by the informal reading inventory. As 
j an example of the difficulties encountered in using the Gates 
Reading Survey to identify reading levels for students at the 
lower end of the scale, Killgallon reported that a student who 
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scored 2.8 on the Gates survey was found to be utterly incapa- 
ble of reading a pre-primer when tested with the IRI. 

Sipay (1964) attempted to obtain objective evidence on the 
extent to which the level of reading achievement as measured by 
standardized reading achievement test scores differed from the 
functional reading levels as estimated by an author-constructed 
informal reading inventory. He administered the Metropolitan 
Achievement Test: Reading, the Gates Reading Survey, and 
the California Reading Test to 202 subjects from eight fourth- 
grade classes. The students were given an individually admin- 
istered informal inventory which was based upon selections 
from the Scott, Foresman reading series. The criteria for de- 
termining the functional reading levels are presented in Table 
5. The statistical analysis of the test scores indicated the fol- 
lowing results: 



Table 5 Criteria used to estimate functional reading levels bv Sioav 
(1964) 



Level 


Accurate word 
pronunciation 


Minimum 

comprehension 


Instructional 
Cooper— Criteria 96 


96% -99% 


60% 


Betts — Criteria 90 


90% -95% 


60% 


Frustration 


less than 90% 


less than 50% 



1] When the more stringent criteria were used to estimate 
the instructional level, all three standardized tests 
tended to overestimate the instructional level by ap- 
proximately one or more grade levels. 

2] When Criteria 90 was used, the mean score on the 
Metropolitan test was 0.11 grade levels higher, while 
the Gates survey overestimated the Criteria 90 instruc- 
tional level by 0.29 of a grade level, and the mean of 
the California test was 1.02 higher than that of the 
Criteria 90 instructional level. 
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3] The standardized tests, when compared with the frus- 
trational level criteria, were significantly lower in the 
case of Metropolitan and Gates test. 

4] A comparison of the means of the frustration level 
and the California test revealed that the California 
Reading Survey underestimated the frustration level 
by 0.24 of a grade level. This difference was signifi- 
cant at the .05 level. 

In conclusion, Sipay (1964, p. 268) stated: 

These findings suggest that it is impossible to generalize 
as to whether standardized reading achievement test scores 
tend to indicate the instructional or frustration level. 
Rather, it appears that in making such judgments, one must 
consider the standardized reading test used and the criteria 
employed to estimate the functional reading levels. 

In still another study of the relation between results on in- 
formal reading inventories and standardized tests, Glaser (1964) 
compared the functional reading levels of retarded seventh- 
grade and advanced third-grade students to their score on 
the Gates Reading Survey. All of the students in both groups 
had scored between 5.0 and 5.9 on the Gates survey. The find- 
ings of Glaser’s study indicated: 

1] The instructional levels of the advanced and retarded 
readers were consistently lower than the levels of their 
standardized reading test scores with a slightly larger 
spread evident for retarded readers. 

2] Sixteen (52 per cent) of the retarded seventh-grade 
readers reached frustration level in passages of fifth- 
grade difficulty; 17 (57 per cent) of the third-grade 
pupils met the criteria for frustration at this level. 

3] The instructional levels were consistently below the 
standardized reading test scores for the two groups. 

4] Providing reading instruction and materials for stu- 
dents on the basis of standardized reading test scores 
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could hinder their progress and possibly affect their at- 
titude toward reading. 

McCracken (1962) compared the performance of 56 sixth- 
grade pupils on the Iowa Every -Pupil Tests of Basic Skills, Test 
A: Silent Reading Comprehension to the reading comprehen- 
sion and vocabulary scores on an informal reading inventory 
which included both oral and silent reading. The three levels of 
performance on the informal reading inventory were the imme- 
diate instructional reading level, the maximum instructional 
reading level, and the word recognition level. McCracken con- 
cluded that the use of standardized test scores to determine the 
level of instruction would place 63 per cent of the students at a 
frustration reading level and suggested that the standardized test 
scores should be lowered by two grades. He urged that this 
score be used to determine instructional level. If McCracken’s 
recommendations were followed through with the students in his 
study, only four per cent would have been reading books which 
would be too difficult and seven per cent would have been read- 
ing books which would be too easy. McCracken’s suggestion, 
however, only has validity for the Iowa Every-Pupil Tests of 
Basic Skills (which he used in the study) and the reading mate- 
rials which formed the basis for his informal inventory. 

A validity and reliability study of a standardized informal 
reading inventory — the Standard Reading Inventory — was car- 
ried out by McCracken (1964). McCracken attempted to de- 
velop the content validity of the inventory by controlling the vo- 
cabulary, sentence length, content, and style of the reading se- 
lections. Construct validity was studied by administering the 
oral reading paragraphs contained in the inventory to 664 chil- 
dren in grades one through six. The significant differences 
found in student performance as paragraphs of increasing diffi- 
culty were read were quite substantial. Reliability evidence for 
the alternate forms of the inventory was obtained by having two 
examiners administer alternate forms of the Standard inventory 
to sixty elementary school children. Correlations of reading 
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levels between the two forms for the independent, instructional, 
and frustration reading levels ranged from .86 to .91. The cor- 
relations between the two forms for the eight reading sub-skills 
measured by the inventory ranged from .68 for word recogni- 
tion errors to .99 for vocabulary in isolation. From the results 
of this study, it certainly appears that the Standard Reading In- 
ventory should validly determine students’ functional reading 
levels. In addition, the reliabilities between alternate forms of 
the inventory suggest that they could be used interchangeably in 
determining growth during a reading program. 

Seven doctoral dissertations reported since 1961 have com- 
pared performance on informal reading inventories to that on 
standardized reading tests. McCracken’s (1963), Sipay’s 
(1961), and Glaser’s (1964) have already been discussed in 
some detail. In a dissertation, Williams (1963) compared the 
performance of fourth, fifth, and sixth graders on an informal 
reading inventory, based on their classroom basal readers, with 
their scores on the California Reading Test, the Gates Reading 
Survey, and the Metropolitan Achievement Tests: Reading. 
When an informal reading inventory was used which contained 
selections from basal readers with which the students were fa- 
miliar, the standardized tests were found to place students rela- 
tively near their instructional level. This finding is somewhat 
different from those of other researchers. Another result of 
Williams’ study is that the disabled readers showed more stand- 
ardized test versus inventory variance at the instructional read- 
ing level than did normal readers in any one grade. 

Leibert (1965) compared informal reading inventory per- 
formance and scores on the Gates Advanced Primary Reading 
Test for second-grade students. Leibert reported differences in 
grade placement for the two measures, but suggested that these 
differences may be due to the wider range of skills included in a 
group standardized test, while reading as measured by an infor- 
mal reading inventory is more narrowly defined. 

Patty (1965) contrasted the Gilmore Oral Reading Test and 
the Gray Oral Reading Test with IRI performance. Patty 
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found that it was impossible to generalize as to whether stand- 
ardized oral reading tests indicate the functional reading levels 
of children as accurately as informal reading inventories do. 
Because of the economy of administration and the usefulness of 
the information they provided, the Gray Oral Reading Test and 
an informal reading inventory were deemed the most desirable 
instruments for determining functional reading levels. Brown 
(1963) came to a similar conclusion in a study using the follow- 
ing silent reading tests: the California Reading Test, the Metro- 
politan Achievement Test: Reading, the Stanford Achievement 
Test: Reading, the Iowa Every-Pupil Tests of Basic Skills, and 
the Gates Reading Survey. Brown found no consistent rela- 
tionship between performance on these tests and on informal in- 
ventories. However, the Brown and Patty studies are not di- 
rectly comparable: Brown used standardized silent reading tests 
while Patty used standardized oral reading tests. 

In reviewing the findings of the studies cited above, several 
generalizations appear appropriate. First, it is important to re- 
member that the purposes of standardized tests and informal in- 
ventories differ. Most publishers of standardized tests do not 
suggest that the grade score norms be used as indicators of the 
levels at which reading instruction should be provided. Rather, 
the standardized tests are designed merely to compare students 
to each other in terms of their reading skills. Secondly, per- 
formance on one informal reading inventory based on only one 
set of materials or set of basal readers in all likelihood will differ 
from performance on another reading inventory based on an- 
other set of materials. If an informal reading inventory is based 
on the materials used in classroom instruction, students perform 
better on that inventory than they would when presented with 
an inventory based on an unfamiliar set of materials. At the 
same time, estimates of student performance on classroom in- 
structional materials is probably of greatest value to teachers. 
Third, any comparisons between IRI performance and stand- 
ardized test scores are entirely dependent on: 1] the stand- 
ardized test used, 2] the materials used to construct the IRI, 
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3] the criteria used to evaluate performance, and 4] the abil- 
ity and skill of the examiner in recording errors and judging 
performance on the inventory. Finally, it seems that informal 
reading inventories are not as useful at the upper grade levels as 
they are at lower grade levels. Evidence (Wells, 1950) has al- 
ready been cited which indicates that at the upper grade levels 

_silenj reading abilities may be quite different skills. As 
it has been pointed out: 

above the sixth-grade level, certain limitations inherent in 
available reading textbooks render the estimates of grade 
placement based upon them probably less reliable and less 
refined than those of the standardized tests at corresponding 
levels. Prominent among the limitations referred to is the 
lack of a carefully graded vocabulary and the absence of 
any satisfactory control of comprehension difficulties arising 
from sources other than vocabulary difficulty such as, sen- 
tence length, sentence structure, extent of reference to sub- 
jects foreign to the experiential background of the pupil, 
and unrestricted use of fiction, or words for which concrete 
referents are unavailable. (Killgallon, 1942, p. 180) 

Diagnosis through self-appraisal 

In addition to the informal reading inventory, another proce- 
dure which has been proposed for diagnosing reading achieve- 
ment is the use of the reader’s self-evaluations. The major re- 
search concerns in this area have focused on the validity of 
self-evaluations — whether reader self-evaluations are useful in 
providing the teacher with added insight into a pupil’s read- 
ing difficulties. While it has been well established that self- 
evaluations are a sound procedure in psychology, it has yet to be 
shown that it is sound practice in evaluating student reading 
abilities. There is also sparse research evidence supporting the 
validity of student self-evaluations in terms of assessing per- 
formance. Purcell (1963) polled college and adult students in 
reading improvement classes to determine the relative impor- 
tance the students assigned to the factors which could have been 
causing them to read slowly. The factors Wfere taken from a 
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reading workbook and were explained by the instructor. Pur- 
cell’s procedure appeared to limit the number of factors avail- 
able for evaluation despite the fact that students were allowed to 
include additional factors. The factor which was rated as most 
important and was checked by 645 of the 827 students was back 
tracking; following in order of importance were daydreaming, 
word-by-word reading, vocalizing, and monotonous plodding. 
The validity of these as separate skills of reading is certainly 
open to serious question. It would be quite surprising if stu- 
dents would be able to identify these skills in other students; it is 
also probable that teachers would, likewise, be unable to do so. 
Certainly, the value of Purcell’s study would have been consid- 
erably enhanced if these student ratings had been related either 
to test performance or teacher ratings. Spaights (1965) ac- 
tually did this in his study comparing the seif-estimates of eighty 
junior high students with their performance on the California 
Achievement Tests. Comparisons were made for each track of 
the school’s four track system: able class learner (mean I.Q. 
116), regular class learners (mean I.Q. 95), modified class 
learners (mean I.Q. 83), and slow learners (mean I.Q. 64). 
Students’ self-ratings in slow learner classes correlated at the 
highest level with California Achievement Test reading grades 
.79; the regular class learners, .70; the modified class, 1 .55; and 
the able class students’ self-ratings correlated lowest at .36. Sev- 
eral elements weakened Spaights’ study: foremost was Spaights’ 
assumption that the California test was reliable for all four 
groups. Perhaps many of the more-able learners scored at the 
upper end of the test scale and, therefore, many of them were 
not being accurately measured by the test because the test was 
not difficult enough for them. The use of teacher ratings would 
have added useful insights into this problem. Another factor de- 
tracting from the study was the questionable practice of employ- 
ing student ratings based on grade score ratings. Spaights did not 
describe the rating sheet, but if product moment correlations 
were used, it is probable that the students were asked to rate 
themselves 7.1, 7.2, and so forth on the reading scale. It is 
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highly unlikely that students know the difference between a 
sexgnthr qf sixth-grade reading level much less the difference be- 
tween 7.1 and 7.2. 

A study which examined the usefulness of self-ratings as 
compared to formal evaluations was undertaken by Darby 
(1966). In this investigation, self-referred students and for- 
mally referred students were found not to differ in amount of 
reading growth during a reading improvement program nor 
were they found to differ in the length of time they remained in 
the program. However, at the conclusion of the program the 
self-referred students did score higher on the Brown-Holtzman 
Survey of Study Habits and Attitudes. 

Most of the studies of self-evaluation have failed to relate 
the self-analyses to growth in the areas of identified weakness. 
If a student is able to identify his own reading deficiencies, he 
should then make greater improvements in those areas which he 
has specified as being weak. The comparison of student self- 
ratings to standardized test scores would not seem to be a useful 
approach to studying the value of self-diagnosis. Even if perfect 
correlations are established between these two measures, it 
would not indicate whether the self-diagnoses are more useful 
than the standardized tests; rather, it would show that one 
measurement procedure could be substituted for another. 

Teacher ratings 

Comparisons of teachers’ ratings of student achievement 
with standardized test scores has also received some research at- 
tention. Studies of teacher ratings have been concerned pri- 
marily with comparing the predictive validity of reading readi- 
ness tests and teacher forecasts (Kermonian, 1962; Henig, 
1949), the ability of teachers to diagnose and classify readers 
(Burnett, 1963; Hitchcock & Alfred, 1955; Preston, 1953; 
Emans, 1964), and teachers’ skill in the selection of reading 
tests (Fisher, 1961; Bauernfeind, 1967). 

Kermonian (1962) compared teacher ratings of reading 
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readiness with scores on the Metropolitan Readiness Test. 
This study was undertaken to update Kottemeyer’s (1947) find- 
ings which indicated that: 

1] the subjective judgment of teachers as to first-grade 
reading success of children is as valid as results ob- 
tained by standardized tests; 

2] teachers with more than ten years of experience pre- 
dict reading success with greater accuracy than those 
with less experience; 

3] Errors in appraisal occur mainly when teachers credit 
potential first-grade reading success to children who 
do not later attain this end. 

Kermonian in his study found that teacher ratings and the 
Metropolitan Readiness test scores correlated .73 and that the 
majority of errors which were made by teachers were in the 
direction of overrating students. The major weakness of Ker- 
monian’s study was that no comparison of the teacher ratings or 
Metropolitan test scores were made with later reading achieve- 
ment. Because of this, the conclusion that the use of reading 
readiness tests should be optional and teachers should be al- 
lowed to exercise their own judgment in appraisal is quite un- 
tenable. 

Henig (1949) conducted a study similar to Kermonian’s ex- 
cept that both the readiness test and the teachers’ forecasts were 
compared to later reading achievement. The Lee-Clark Read- 
ing Readiness Test was used in this study. Henig used a five- 
level categorization of readiness ratings for both the readiness 
tests and the teachers’ ratings (excellent, good, fair, poor, very 
poor) and compared these to a five level categorization (as- 
signed grades from A to E) for first-grade reading achievement. 
The results indicated that the teacher ratings were as valid as 
the readiness tests in predicting later reading achievement. 

From these two studies, there seems to be substantial evi- 
dence that teacher forecasts and at least two standardized tests 







mm 











Methods for assessing reading achievement 



113 



of reading readiness are highly correlated. There is also some 
evidence that these two procedures are equally valid in predict- 
ing later reading achievement. 

Teachers’ ability to make diagnostic evaluations of students’ 
reading performance has been shown to be related to amount 
of training the teachers have had in reading courses, amount of 
teaching experience, and type of college attended (Burnett, 
1963). The studies in which teacher judgments were com- 
pared to standardized reading tests seem to be most dependent 
on the type of test which teachers ratings were being compared 
to and the amount of teacher knowledge of reading education. 

When the comparison tests are general reading proficiency 
tests, teachers’ judgments show a high degree of relationship 
with the tests. Hitchcock and Alfred (1955) found the corre- 
lations presented in Table 6 between English teacher ratings 
and the Stanford Achievement Test: Reading for 101 eighth- 
grade students. The correlations indicate that there is much 



Table 6 Correlations between English teacher ratings and Stanford 
Achievement test scores (Hitchcock & Alfred, 1955, p. 423) 



Test rating 


Paragraph 

meaning 


Word 

meaning 


Average 

reading 


Paragraph meaning 


.74 


.75 




Word meaning 


.73 


.79 




Average reading 


.78 


.83 


.83 



agreement between the results of the test scores and the teacher 
ratings. However, there also is a great deal of trait overlap be- 
tween the paragraph meaning and word meaning categories, in- 
dicating, therefore, that the diagnostic proficiency of the tests 
an f the teachers’ rating were somewhat limited. 

j Several studies (Preston, 1953; Emans, 1964) have found 
that the more experience teachers have with making diagnostic 
evaluations, the less agreement their ratings have with diagnos- 
tic tests. Preston (1953) found that elementary teachers 
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tended to classify students as retarded readers when they were 
actually reading up to or near capacity. In comparing teacher 
classifications to test scores, Preston divided the reading grade of 
each child by his mental age. He concluded that any student 
whose index or ratio fell below .80 was a retarded reader. In 
two schools in which this procedure was followed, 43 and 60 per 
cent of the normal readers were, according to Preston’s index, 
incorrectly classified as retarded by the teachers. The most se- 
rious deficiency of Preston’s index is that the mental ages of 
each child were taken from group standardized intelligence 
tests. The tests he used were the Kuhlmann-Anderson Intelli- 
gence Tests and the California Test of Mental Maturity. Both of 
these tests are highly correlated with reading achievement. 
Therefore, a student who was a poor reader probably scored 
poorly on the intelligence test (which would certainly be ex- 
pected) and, therefore, would not be classified by the index as a 
retarded reader. In discussing his findings, Preston indicated 
that this may have occurred in a number of instances. 

Emans’ (1964) study compared remedial reading teachers’ 
rankings of the reading skills with which students needed help 
with the skills indicated by the individually administered Gates 
Reading Diagnostic Tests. The twenty teachers involved in 
Emans’ study each had worked individually for at least 25 hours 
with the two students they rated. Emans found that teachers do 
not perceive children’s individual reading needs according to the 
test results. He concluded that individualized reading pro- 
grams would be doomed from the start unless a standardized 
diagnostic reading test were used to identify the reading skills 
needs of the students. A shortcoming of the study was Emans’ 
failure to describe the procedures used by the teachers in mak- 
ing their diagnostic evaluations — the directions to teachers, the 
definitions of the skills, and the format of the rating procedures 
all would influence the results of the teachers’ ratings. Emans 
also did not discuss the possibility that the lack of agreement 
might not be due to the lack of validity of the teacher evalua- 
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tions, but rather might be attributed to the lack of diagnostic va- 
lidity of the Gates test. 

Perhaps the best criterion for determining the usefulness of 
diagnostic evaluations would be to compare the amount of im- 
provement made by students selected for a remedial program on 
the basis of teacher ratings versus those selected on the basis of 
standardized test scores. A study of this type was conducted in 
Scotland by Lytton (1961). Lytton found that it made no dif- 
ference whether children were selected for remedial reading in- 
struction by teachers’ judgments or by standardized tests when 
the criterion was standardized test score improvement in read- 
ing. 

What kinds of tests do teachers prefer as diagnostic tools? 
How skillful are teachers in selecting tests? There are very few 
studies which provide any answers to these questions. It is 
highly probable that teacher selection of reading tests is related 
to their educational backgrounds and to their teaching responsi- 
bilities. There is some evidence that teachers at the elementary 
and secondary level would like to have reading comprehension 
and vocabulary tests administered to the students they teach 
(Bauernfeind, 1967). Fisher (1961) conducted a study with 
1,041 elementary school children which indicated that “out of 
grade” reading tests were better measures of the reading ability 
of advanced and retarded readers. By “out of grade” tests, 
Fisher meant tests that were used at a higher or lower grade 
level than the level where the publisher suggested they be used. 
He believed that such tests are consistently better suited to the 
actual performance of advanced and retarded readers. In this 
study they provided better discrimination between the abilities 
of advanced and retarded students and contained materials with 
better content validity. Fisher further concluded that “out of 
grade” tests merit more extensive use in cases where pupils’ 
abilities are markedly different from the norm of his particular 
grade. Fisher’s results indicated that the selection of tests 
should involve more than merely examining the technical and 
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logical properties of the test; it should also entail some under- 
standing of the abilities of the individual students being adminis- 
tered the tests. 

The informal procedures discussed in the previous section 
all seem to provide useful information for assessing students’ 
reading behaviors. However, the use of any of these tech- 
niques should be limited by an understanding of the strengths 
and weaknesses of each. Informal assessment of students’ 
reading performance should also include an examination of the 
students’ ability to apply their reading skills in content subjects. 
The diagnosis of this ability is extremely important if the 
teacher is concerned about the ultimate objective of reading de- 
velopment — the utilization of text material to further learning. 



Assessing reading in content areas 

The appraisal of students’ reading in social studies, math, 
science, literature, and other subject areas can provide the read- 
ing teacher with relevant diagnostic information about how well 
the student can apply the reading skills he is taught. Such ap- 
praisal can also provide the content teacher with information 
about how a student can be helped to learn more efficiently in a 
given subject area. In an early study of reading skills in the 
content areas, Artley (1944) found that while some relation- 
ship exists between tests of general comprehension and compre- 
hension in the social studies, there is also a high degree of spe- 
cificity in the factors relating to reading comprehension in the 
social studies. A command of the specialized vocabulary of so- 
cial studies was found to be at least as important as knowledge 
of social studies facts on tests measuring knowledge of facts in 
social studies. Several studies since Artley’s early investigation 
have also concluded that comprehension of reading material is 
different in each subject area (Shores, 1960; Maney, 1958; 
Halfter & Douglass, 1960). If this finding is accurate, it means 
that the diagnosis of a student’s reading performance in a con- 
tent area must be concerned with more than his general reading 
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comprehension. Students may be performing poorly in aca- 
demic subjects not because they lack reading comprehension 
ablities in general, but because they lack the specific ability to 
apply this skill to various subject areas. The diagnosis of read- 
ing ability, therefore, needs to go beyond an evaluation of gen- 
eral reading power and should examine the reader’s ability to 
apply his reading skills. 

Shores (1960) found that comprehension of science mate- 
rials for sixth-grade students was related to their purposes for 
reading. The purposes Shores established for the reading were: 
1] reading for the main idea and/or 2] reading to keep a se- 
ries of ideas in sequence. Shores found that reading for the 
main idea is more like what is measured by tests of general 
reading achievement than is reading for a series of ideas in se- 
quence. 

In an earlier study with fourth, fifth, and sixth graders, 
Shores and Saupe (1953) investigated whether the type of 
reading comprehension demanded of a student in each content 
area differs qualitatively beyond the primary grades. It was 
discovered that the kind of reading used in grades four, five, and 
six for problem-solving in science has “a large factor in common 
with mental ability and general achievement as these are com- 
monly measured and yet is somewhat unique in a manner which 
cannot be accounted for by these generalized factors” (Shores & 
Saupe, 1953, p. 157). They added that better testing instru- 
ments were needed to define the nature of this unique variance 
and its relation to general reading comprehension. 

Further support for the hypothesis that reading comprehen- 
sion is a specific ability related to specific purposes for reading 
and various subjects was reported in a study with 513 fifth- 
grade students (Maney, 1958). Maney administered an author- 
constructed test of science reading comprehension, the Gates 
Reading Survey — Level of Comprehension and the Pintner 
General Ability Tests. Intercorrelations between the test items 
were then examined. Of great importance is that literal read- 
ing comprehension correlated with each critical science read- 
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ing test item from - .15 to + .47. This finding lends consid- 
erable credence to Maney’s conclusion that critical reading of 
science materials cannot be predicted from general reading tests 
or from a test of literal reading comprehension. 

Each of the studies cited thus far have emphasized the need 
for tests of reading comprehension in each subject area. 
Maney (1958) developed such a test for use in science classes 
with fifth graders. Researchers in other content areas have also 
attempted the development of reading comprehension tests for 
specific subjects. Half ter and Douglass (1960) developed a 
test designed to measure a students’ general competence in 
reading skills peculiar to the field of commerce. Their test cor- 
related highly with successful performance in a business school. 
Comparative validations of the test were provided by correlating 
high school grades and the Ohio State University Psychological 
Test. The Ohio State test correlated with later grades .64 as 
did the Commerce Reading Comprehension Test. The two 
tests and high school grades provided a multiple correlation of 
.77 with first semester grades in business school. However, 
Halfter and Douglass failed to indicate the amount of variance 
contributed by high school grades which limits their conclusion 
that the Commerce Reading Comprehension Test is a useful 
predictor of later grades in business courses. 

Comprehension of vocabularj' in a particular academic area 
has also been suggested as an important predictor of success in 
that content field. The reading vocabulary sub-test of the Cali- 
fornia Reading Test — Upper Level is divided into specific sub- 
ject matter areas. While there have been few studies which 
have examined the validity of tests of vocabulary in specific sub- 
jects, several researchers (Johnson, 1952; Wyatt & Ridgeway, 
1958; Dunlap, 1951) have concluded that subject-oriented vo- 
cabulary tests reveal students’ weaknesses in understanding the 
vocabulary of textbooks in that field. 

Mary E. Johnson (1952) constructed a vocabulary test 
consisting of 150 multiple-choice items designed to test fifth 
graders’ understanding of vocabulary in six content fields : arith- 
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metic, geography, history, science, health, and literature. The 
words used in the test were taken from the fifth-grade books 
which the students used for daily study. Because the pupils 
tested did not seem to be equipped to deal with the vocabularies 
of the texts used in content fields, it was concluded that a pro- 
gram of word enrichment was needed. Similar findings have 
been reported with high school students (Wyatt & Ridgeway, 
1958; Dunlap, 1951). 

Belden and Lee (1961) compared Dale-Chall readability 
scores of five general biology textbooks adopted for use in Okla- 
homa high schools. Three hundred fifty-seven tenth graders in 
six Oklahoma high schools were then administered the Nelson- 
Denny Reading Test and students’ reading ability and the read- 
ability level of the textbooks were compared. Only one of the 
five biology texts was found to have a readability level suitable 
for at least fifty per cent of the students who were using it. 
This conclusion is limited, however, by the lack of reliability ev- 
idence for the Dale-Chall formula and the Nelson Denny test. 
The lack of agreement between readability level and students’ 
reading ability could have been due to either the invalidity or 
unreliability of these two measures, but the findings do suggest 
that a complete diagnosis of a student’s reading ability must in- 
clude an assessment of his ability to read textbook material. 

The number of investigations related to the measurement of 
reading ability in content areas indicates that many researchers 
and teachers feel there is a need for tests of specific reading 
skills. Most of these studies (Maney, 1958; Half ter & Doug- 
lass, 1960; Shores & Saupe, 1953) are related to attempts to 
measure reading comprehension as it relates to a specific sub- 
ject. Others (Johnson, 1952; Wyatt & Ridgeway, 1958; Dunlap, 
1951) have pointed out the need for measuring students’ vocab- 
ulary ability in each subject area so that the needed vocabulary 
instruction can be provided and students can learn more effec- 
tively from textbooks in each subject. 

How successful have these attempts been? First of all, 
there is a serious lack of research related to the basic compo- 
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nents of reading comprehension and their relation to various 
subjects. The majority of investigations have relied on the cor- 
relation coefficient for their analyses. While such a procedure 
does indicate that two variables are . ated, it does not provide 
the reasons underlying such a relation. A given reading com- 
prehension test in science may be related to later success in sci- 
ence not because the test is a test of specific “science reading 
ability, but because the student who has had past experience 
with science not only achieves at a high level on such a test but 
he has a high probability of performing well in a science class. 

More basic research has been conducted on the elements 
composing general reading comprehension than on those com- 
posing reading comprehension in specific subject areas. This 
research, reported previously in the present monograph, has 
been quite equivocal in the validation of attempts to measure 
specific reading skills. Hunt (1957) cautioned against the use 
of measures of specific reading comprehension skills: 

However, it seems to this writer that the whole question of 
the construction of diagnostic measures of reading compre- 
hension needs further examination. There have been sev- 
eral efforts to use the procedure of naming the important 
skills of reading comprehension, constructing items designed 
primarily to measure each of the skills as labelled, and then 
studying the responses to the items by a sample or group of 
examinees. This conventional procedure has usually been 
least exhaustive in the two most important steps: namely, 
item construction and the analysis of student responses to 
the different sets of items. (1957, p. 169) 

Hunt obviously was pleading for more careful definitions of the 
skills which are to be measured. If a test constructor suggests 
that reading comprehension is different in science than it is in 
social studies, he must describe exactly how they differ. It is not 
enough for him to build two reading comprehension tests, one 
based on science material and one on social studies material. 
Furthermore, attempts to validate such tests must be related to 
students’ responses. Correlating a test of science reading ability 
with grades in science is not a valid procedure for examining the 
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unique qualities of reading comprehension in science. By 
studying students’ responses, it may be possible to determine if 
the student goes through a different mental procedure in com- 
prehending science material than he does in comprehending so- 
cial studies material. 

Finally, there is a lack of tests on the market which measure 
reading achievement in specific subjects. If content area teach- 
ers desire information regarding students’ reading performance 
in that content area, it would be most useful for them to develop 
informal reading inventories designed to measure students’ skill 
in learning from text material. To determine reading levels and 
specific skill weaknesses, Eller (1965, p. 188) has suggested on 
the college level a procedure of using informal “tests which in- 
clude samples from the texts used in the basic freshman courses 
in science, social science, and English.” Eller further recom- 
mended that this informal approach be used by the reading 
teacher or the subject matter teacher to diagnose other skills: 
“. . . he can easily begin to ‘specialize’ in the development of 
special collections of exercises for the appraisal of note-taking 
skills, evaluation skills, abilities concerned with the organization 
of information and locational and reference skills” (1965, p. 
188). 

A note to the practitioner 

This chapter has attempted to review some of the more im- 
portant studies dealing with the problems of using formal and 
informal tests to assess students’ reading achievement. It has 
also tried to point out explicitly the implications this research 
has for the test consumer, what devices might be most .helpful 
for diagnosing skills and at what levels they are most appro- 
priate. The major conclusion, if any, from the preceding re- 
view must, of course, be that much research is needed before 
definitive suggestions for classroom practice can be made. 
However, such a conclusion is scarcely helpful to the practi- 
tioner who is faced with immediate problems of how to assess 
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an individual student’s achievement. If anything, though, the 
research should have said to him that no one method can solve 
his problems. Knowledge about the diagnosis of reading 
achievement is not so scant that the teacher need be paralyzed. 
Given a variety of procedures, teachers can make a reasonably 
accurate assessment of students’ skills, capabilities, and needs. 
Both standardized and informal tests can help in grouping stu- 
dents for instruction, determining reading levels, and diagnosing 
reading achievement. 

The most efficient procedure for determining instructional 
groupings or for comparing students in general reading develop- 
ment is to use a group standardized reading test. The selection 
of the appropriate test should be done by comparing instruc- 
tional objectives to the test objectives and by selecting a test 
which has the broadest coverage. In using the test results for 
comparing students, the teacher should not depend upon grade 
norms, instead he should rely on standard scores. In addition, 
no attempt should be made to use sub-test scores for diagnostic 
purposes. Care should also be taken to make sure that the test 
is not too easy or too difficult for more able or less able stu- 
dents. A standardized test is valid for comparing students only 
when the standardized administration procedures are carefully 
followed for all the students who are to be compared. After 
the teacher has obtained some idea from the standardized tests 
about who the good, the average, and the poor readers are, the 
next step is to determine their functional reading levels. This 
can be done by using standardized reading tests in a procedure 
outlined by Farr and Anastasiow (1969). This procedure is 
based on determining the relationship between a particular 
standardized reading test and an informal reading inventory. 

An informal reading inventory, developed by the classroom 
teacher and based on the classroom instructional materials, 
provides a very useful measure of each student’s ability to read 
at increasingly difficult levels. Most often overlooked in the use 
of informal reading inventories is their use as a daily, contin- 
uous part of reading instruction. By constantly being alert to 
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each student’s reading performance and applying the criteria for 
assessing informal reading inventory performance, the teacher 
can adjust the instructional materials to insure continued stu- 
dent success. After determining appropriate reading levels for 
students, the teacher’s next concern relates to the diagnosis of 
reading skills development. 

The validity of the teacher’s diagnosis of students’ reading 
skills can be increased if he selects or develops measurement 
devices which assess those skills which he has concluded are 
most important for the students’ reading skill development. 
This would mean the teacher would accumulate a collection of 
procedures and tests for the continuous diagnosis of students 
reading achievement. This collection would probably include 
certain sub-tests of group and individual standardized reading 
tests, teacher-developed checklists or tests, and classroom ob- 
servations of students’ behaviors. In using such instruments, it 
is essential that the teacher realize that the instruments are 
being used as criteria tests and are not for the purposes of com- 
paring one student to another. Their value lies in the informa- 
tion they can provide about students’ development in particular 
skill areas. Other measurement procedures such as psychologi- 
cal tests and teacher observations were reviewed in the preced- 
ing chapter. 

How do these fit into a total evaluation program? First of 
all, there is very little evidence that psychological tests provide 
any useful information for diagnosing students’ reading achieve- 
ment. Before their use becomes accepted diagnostic practice in 
the classroom and clinic, their validity needs to be carefully 
studied. However, it should be pointed out that this research 
should develop from questions raised from attempts to use these 
tests. It is, therefore, suggested that the tests should continue 
to be used in controlled situations. Teacher evaluations appear 
to be quite valid and reliable measures of students’ general 
reading development. In this regard, they are most comparable 
to standardized group reading tests and, as with the group tests, 
there is considerable question concerning the validity and relia- 
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bility of classroom teachers’ diagnosis of specific sub-skills of 
reading. 

The past two chapters have reviewed research studies con- 
cerning the problems of measuring specific reading skills and the 
problems of using various procedures for assessing reading skill 
development. With these areas clarified to some extent, the 
next chapter proceeds to a consideration of the theories and re- 
search dealing with one of the major uses of measurement de- 
vices in reading — the assessment of growth. 
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Student growth in reading skills is the single most important 
goal of the reading program. Probably the most valuable con- 
tribution which measuring devices can make to reading instruc- 
tion is that of providing a reliable and valid assessment of this 
growth. The need for such assessment cannot be overempha- 
sized: most of the elements within the reading program — the 
teaching procedures, the grouping practices, the curriculum 
structure, and even teacher capabilities — are evaluated on the 
basis of student growth. While it is not proposed that student 
growth be the sole basis for evaluating the reading program, 
nonetheless it is the single most important variable to consider 
in assessing reading programs. Consequently, a chapter on 
assessing growth in a monograph on measurement and evalua- 
tion in reading needs little justification. 

Research in assessing growth has been sparse, and this in it- 
self has been a major obstacle to improving evaluation proce- 
dures. Too often statements and suggestions are made about 
the value of a particular procedure when there is no research 
evidence to substantiate it. But the scarcity of studies in assess- 
ing growth does not prevent an intelligent discussion of current 
evaluation procedures and there are a number of studies which 
stress the need for improving present practices. 

The review of research presented in this chapter deals with 
the problems of pre- and post-test measurement. This discus- 
sion applies to assessing student progress at all levels of instruc- 
tion, from pre-school to the adult levels. Two areas have been 
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singled out for special attention: evaluating growth in remedial 
reading programs and the use of readiness tests as a means for 
predicting performance. These two areas are given special em- 
phasis because both require extensive use of measurement de- 
vices and because they have commanded more research atten- 
tion than other facets of the reading program. 

Difficulties in assessing growth 

Scores on both informal and standardized tests have, for the 
most part, served as the basis for assessing growth in reading. 
The ways in which these scores have been used as the criterion 
for evaluating growth was best described by McDonald (1964). 
McDonald delineated three major methods for evaluating 
growth, all of which are comparative. The first method in- 
volves comparing scores on alternate forms of a test and using 
the difference in performance on the pre- and post-tests as the 
criteria for assessing change. A second method entails taking 
the average yearly gains made by a particular group and com- 
paring them with those made by a nationwide norm. The cri- 
terion for growth in this instance is not how the student achieves 
individually in relation to his own past performance, but how he 
does, on an average, in regard to some national norm. The 
third method described by McDonald involves comparing test, 
re-test scores of a remedial group with that of a control group 
other than the national norm group. While the three methods 
for using test scores as the basis for assessing growth described 
by McDonald are the most commonly used ones, it does not 
mean that they are necessarily the most efficient or accurate 
means of evaluating progress. Indeed, McDonald was well 
aware of their limitations. 

The central problem in measuring growth in reading is the 
validity and reliability of methods for assessing student progress. 
Are tests the best instruments for evaluating growth? If they 
are, are alternate forms (pre- and post-tests) useful? Are the 
alternate forms comparable, i.e., do they measure the same or 
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different skills? Are the nationwide norms established by test 
publishers comparable to the group being tested so that the re- 
sults, based on evaluating students’ performance against that of 
the norm, will be meaningful? It would be both tedious and re- 
dundant to review here all those studies which have inappro- 
priately used tests to evaluate growth; rather, it is more useful 
to point only to those studies which demonstrate clearly the 
major validity problems encountered in assessing growth in 
reading. 
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Valid measurement of the skills taught 

The most important decision to be made by the practitioner 
in assessing reading growth is choosing the testing device. The 
practitioner has to be careful that the test he selects validly 
measures what has been taught in the instructional program, 
that it represents the components of reading behavior as defined 
by the instructional program, that the difficulty level of the test 
is appropriate to the group being tested, and that the evaluation 
includes measures of gains over longer periods of time. In 
other words, the practitioner must ask himself whether, given 
the instructional program, the estimate of growth provides the 
information that he needs and whether it provides that informa- 
tion accurately. - 

There are several elements which make tests appropriate to 
any given instructional program. The most obvious one is that 
the skills measured by the test be those which were taught in the 
reading program and that those factors deemed constituents of 
reading behavior by the reading program be so considered by 
the test in about the same proportions. No specific research 
studies related to this problem have been located. However, it 
is logical that the measurement of growth would be invalid if the 
testing instrument failed to measure what has been taught. For 
example, if one of the most important outcomes of the reading 
program is the development of critical reading comprehension 
and instruction has been organized accordingly, a test for 
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measuring growth should be selected which places an equal em- 
phasis on critical reading comprehension. 

A test, even if it has a label which indicates that it may be 
measuring a skill taught in the reading program, may still be un- 
suitable if it is not testing that skill in the manner in which it 
was taught by the teacher. For example, if vocabulary im- 
provement has been developed through using words in context, 
a test would not be a valid measure of vocabulary improvement 
if it presented words in isolation and the examinees were asked 
to select the “correct” synonym from a group of alternatives. 
Related to this is the problem of a test covering not only those 
abilities which have been part of the reading program, but also 
other abilities extraneous to its goals. The single most common 
error is the unconscious inclusion of a speed factor when speed 
of reading is not a goal of the instructional program. The speed 
factor enters through the use of timed tests. Results in any 
pre-test, post-test situation are always influenced because the 
student usually works harder on the post-test knowing that he is 
being evaluated on the basis of the difference between his initial 
and final performance. This Hawthorne-type effect is com- 
pounded when the post-test is a timed test; often the student 
does more work on it regardless of whether he has become a 
more powerful reader or not. The Reed (1956) study demon- 
strates the pitfalls of using timed tests when speed is not an inte- 
gral part of the program. Reed hypothesized that intensive 
training in reading and study skills would yield significant gains 
in reading rate, vocabulary, comprehension, and grade-point 
averages for a group of nursing students. Students were pre- 
tested; following 27 hours of training, post-tests were given. The 
results indicated there was no significant growth in comprehen- 
sion, vocabulary, or grade-point averages; but significant 
growth” was reported for reading rate. A possible conclusion 
from Reed’s study is that the improvement of the rate scores 
was not the result of the instructional program, but rather it was 
the result of the testing. 

Once the question of the test’s suitability to the content of 
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instruction has been resolved, it is then necessary to turn to the 
test’s appropriateness to the student’s instructional level. While 
a test may validly represent the content of instruction and ac- 
curately portray growth for students at one instructional level, it 
may be quite inappropriate for students at another. For exam- 
ple, an oral reading test may be a useful measure of growth for 
first- or second-grade students because of the relative emphasis 
placed on oral reading at those grade levels and because of the 
need to diagnose the students’ word attack skills. However, for 
average readers at the junior or senior high level, oral reading 
tests would not be useful because instructional emphasis at these 
levels is usually placed on silent reading and, while oral and si- 
lent reading ability are quite highly correlated at the lower 
grade levels, they become quite divergent at the upper grades 
(Gray & Reese, 1957; Wells, 1950). 

Highly related to the problem of the appropriate levels of a 
test is the difficulty of the directions and/or design of a test for 
students at any given level. A test which is too easy or too hard 
provides little information about growth. Fisher (1961) has 
demonstrated that the use of tests which are suggested by the 
publisher for a particular grade level may not be valid for the 
advanced or retarded readers of that grade. For example, on 
the Gates Reading Survey the present author has found in class- 
room experiments that it is possible for students to get a raw 
score equivalent to a grade level of 3.0 by random guessing. . If 
the teacher is concerned with growth, a test has to be used on 
which a student s score is a valid indication of his reading ability 
and not his chance-guessing. 

One problem which is not intrinsic to test selection, but 
which is critical once the test has been selected, is whether the 
methods used in administering the test permit evaluation of 
long-term retention of gains. A test administered immediately 
after a short-term instructional reading program would reveal 
only limited evidence concerning the retention of any gains 
made by the students. Ray (1965) studied the three- and six- 
month retention of gains made following a thirty hour reading 
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program for college students. No comparison groups were 
used, but the three- and six-month post-test scores resulted in 
performance which was significantly superior to pre-test per- 
formance. If the objective of the reading program is the reten- 
tion of gains, then delayed post-testing procedures similar to 
Ray’s should be adopted. However, as Ray pointed out, reten- 
tion of gains may also be due, in part, to the increased reading 
demands at higher grade levels and the increased maturation of 
students. Ray’s findings were supported by Smith and Wood 
( 1955) who found that college students after a lapse of time re- 
tained and possibly improved those aspects of reading which 
were emphasized in the reading program. 

Another procedure which might be used to measure the per- 
manency of reading improvement is measuring general aca- 
demic improvement following a remedial reading program. 
The relation between reading gains and academic performance 
is a valid estimate of reading improvement if the reading skills 
related to performance have been a vital part of the reading im- 
provement program. 

The use of alternate test forms 

Is it desirable to select a test with alternative forms to serve 
as pre- and post-tests? Davis (1961) believed that it is: he 
argued that if the same form of a test were used more than 
once, a student might remember parts of it on a subsequent trial 
of the test or he might have even inquired about the test’s con- 
tent between testings. Others (Cronbach, 1960), agreeing with 
Davis, have specifically pointed to a “practice” effect. They 
have shown that a student, even if he does not remember spe- 
cific items on a test or look them up during testing intervals, still 
performs better on that test because he has had practice on’ it in 
the form of the pre-test. Curr and Gourlay’s (1960) research 
substantiated this practice effect. They found that when stu- 
dents at the 9.5 grade level were re-tested at one-, three-, and 
six-month intervals, their mean gains were 10.1, 18.2, and 26.9 
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months, respectively. Similarly, large practice effects were 
noted for both the mechanics of reading and reading compre- 
hension at the 7.5 grade level. This study is limited by the 
small sample used, but the results are still rather amazing. If 
the practice effects had been compared to students’ performance 
on alternate forms of the test and if these same large gains did 
not result from performance on the alternate forms, the study 
would have been more conclusive. 

If the theory that practice affects performance if the same 
test is re-administered is to be accepted, the assumptions under- 
lying this theory should be examined. The first assumption is 
that a student knows which items he answered incorrectly; the 
second is that he will recall these items after the test; another is 
that he will take the time to find out what the correct response 
is; and the fourth and final assumption is that he will recall the 
question and the correct response at a later testing time. It 
seems highly unlikely that these assumptions are valid for most 
elementary or secondary students. Karlin and Jolly (1965) 
studied the practice effect with 161 pupils in grades four to 
eight. In September the appropriate levels of the SR A 
Achievement Series: Reading, Form A, and the California 
Reading Test, Form W, were administered to all subjects. In 
May these same tests were re-administered along with their al- 
ternate forms. After a nine-month interval, there was no dif- 
ference in the amount of growth reported by either the alternate 
forms and the same test, administered a second time. Karlin 
and Jolly concluded that their results raise serious doubts about 
the need for alternate forms of a test for measuring growth. Of 
course, their conclusions are limited by the nine-month period 
of comparison used in the study as well as by the relatively 
small sample of elementary school children used. Nonetheless, 
Karlin and Jolly’s study still stresses the need for more research 
along the lines they used, covering varying periods of time and 
using different student populations. 

Should research, however, prove that alternate forms are 



} 















wszmmt 






Assessing growth 



141 



useful and necessary, the test consumer is still faced with the 
problem of the comparability of the alternate forms for any 
given test — i.e., does the post-test measure the same skills as 
the pre-tests? Dotson and Bliesmer (1955) examined the com- 
parability of forms A, B, and C of the Diagnostic Reading Test: 
Survey Section. The forms were administered to 100 incoming 
freshmen at the University of Texas. It was found that the total 
scores on Forms A and C and the total scores on Forms B and 
C were not comparable. Coates (1968) reported a similar 
study in which Forms A, B, C, and D of the Diagnostic Reading 
Test: Survey Section were given to a group of 63 entering fresh- 
men at St. Petersburg Junior College. Correlations between the 
test forms ranged from .53 to .87. Coates concluded that the 
range of these correlations cast considerable doubt on the equiv- 
alency of the four test forms. It would appear that the results 
of both these studies should provide a basis for a more careful 
examination of the comparability of test forms. 

Even if the statistical equivalency of test forms could be es- 
tablished, there would still be unanswered questions about the 
content equivalency of any two forms. It would be impossible 
for a test developer to control all the variables on a reading test 
from one form to another. The difficulty of the vocabulary, the 
content of the material, and the sentence length and complexity 
are all variables which most test authors attempt to control, but 
for each factor that is controlled, there are several others which 
are uncontrolled. If two forms were exactly equivalent, the re- 
sult would be that the same student getting a certain number of 
items correct on one form of the test would get exactly the same 
number correct on another form of the test. This is almost 
never the case since one test form is usually more difficult than 
another and the tests are equated through statistical procedures. 

ith most parallel forms, the tests have been normed on the 
same populations or random samples from the same population, 
but if this has not been done the equivalency of test forms 
even on a statistical basis would also be void. 
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Validity of norms for measuring growth 

If one chooses to evaluate growth in reading by comparing 
the performance of a particular class or group of classes with 
that of a national norm, it is necessary first to be sure that the 
population on which the test has been normed is comparable to 
the class which is being tested. It is always good practice to ex- 
amine carefully the description of the norm population provided 
by the test publisher. Included in such descriptions should be 
all those variables which are relevant to growth in reading abil- 
ity such as socio-economic class, intelligence levels, and geo- 
graphic area. If these variables are not comparable to the 
group being tested or if the information is not supplied by the 
test publisher, the use of the norm data for comparing growth is 
not a valid procedure. 

Another factor which is important for the test consumer to 
consider is how many times during a school year the particular 
test was administered in the course of being normed. In most 
instances, tests are administered only once and the grade norms 
for each month of the school year are interpolated from this sin- 
gle administration. The use of such grade norms obviously is 
based on the hypothesis that reading growth follows a fairly 
even pattern. Bernard (1966) examined this hypothesis by 
studying the applicability of published achievement test norms 
to testing programs taking place at different times during the 
school year. He concluded that children’s growth in achieve- 
ment does not follow a regular growth curve with progress oc- 
curring evenly throughout the school year and no growth occur- 
ring during the intervening summer. Bernard cautioned against 
test publisher’s use of extrapolation to convert spring and fall 
testing to a common base. He (1966, p. 275) suggested three 
procedures to overcome these weaknesses: 1] schedule testing 
programs dictated by the norms of the tests to be used, 
2] select a test normed at about the time of year testing is to 
be done, 3] forget the published national norms altogether and 
use only local norms. 
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Lennon (1951) also cautioned against using norms gathered 
at one testing period for predicting achievement at a different 
testing period — the procedure which must be followed if tests 
are normed at only one time during the year. Lennon found 
correlations ranging from only .51 to .69 for two adjacent 
grades from grades two to eight. While these correlations are 
fairly large, Lennon pointed out that there would be a large 
amount of error variance in predicting relative reading achieve- 
ment in any grade from the achievement even in the next grade. 
For example, if the correlation between two tests given in two 
different grades was .60 (this is the median of the correlations 
reported by Lennon), only 36 per cent of the variance in the 
second test performance would be accounted for by the first test 
performance. This leaves 64 per cent of the variance unac- 
counted for. 

The study of individual pupils’ reading growth patterns has 
shown that this growth has been quite irregular. While the 
rank correlations at two different test administration times has 
been shown to be fairly high (Townsend, 1951), the individual 
growth patterns of students has been shown to be quite uneven 
by Traxler (1950, p. 107) who warned teachers that they 
“should not be disturbed when they find that their pupils fail to 
grow according to the average of the group. Nonconformity to 
the group’s pattern of growth is the rule and conformity is the 
exception.” He also pointed out, as did Lennon, that any de- 
viations from previous testing cannot always be interpreted as a 
reading gain or loss because test scores always contain a certain 
amount of error of measurement. 

Socio-economic factors are also important in the interpreta- 
tion of test norms. MacArthur and Mosychuk (1966) studied 
the predictive validity of ninth-grade academic achievement test 
scores from a variety of aptitude and achievement tests adminis- 
tered in grades three, six, and seven. Test results were grouped 
on the basis of the parents’ socio-economic status. The me- 
dian correlations of all predictors with all criteria scores rose 
from grade three to grade seven for the upper status group and 
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fell for the lower group. If the achievement test results of chil- 
dren from lower socio-economic status are compared to the ex- 
pected growth from test norms based on those from a relatively 
higher socio-economic status, it would seem that the compari- 
sons will not be valid. 



Procedures for assessing change 

A variety of procedures have been suggested for determining 
the amount of growth in reading an individual attains over any 
given period. Harris (1967) has edited a book of readings 
which includes several articles on the problems of measuring 
change as well as a variety of statistical models, both theoretical 
and practical, which are advanced as solutions to these prob- 
lems. Most procedures for measuring change have been de- 
vised to overcome two major obstacles: 1] the fact that most 
reading improvement programs are developed for the poorest 
readers and 2] the relative unreliability of single measures of 
reading ability. 

The first obstacle — the fact that most reading improvement 
programs are devised for the poorest readers in a group — 
means that the procedure for measuring growth entails the fol- 
lowing: 1] administration of a reading test to all students in a 
particular grade or in several grades, 2] selection of those 
scoring at the lowest end of the distribution on the test for a 
reading improvement program, 3] administration of a post-test 
after the reading improvement program, and 4] comparison of 
post-test scores with pre-test scores (original test). Given such 
a method, significant improvement is usually noted and the 
reading program is labelled a success. But is this improvement 
really significant? It is possible to find significant gains for any 
group of students even if no instructional program intervened, if 
their inclusion in the program was based on their having scored 
at the extreme end of the distribution of test scores on the pre- 
test. This occurs because of the phenonlnion known as regres- 
sion toward the mean — there is a high probability that a student 
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who scores at one extreme of the distribution of scores on a 
pre-test will tend to score nearer the mean on subsequent re- 
tests. For a detailed discussion of the regression effect as well 
as suggestions for avoiding it, the reader is referred to Lord’s 
(1967) article. 

The second major obstacle involved in procedures for evalu- 
ating growth — the relative unreliability of single measures of 
reading ability — derives from inadequate knowledge of what 
sub-skills are involved in reading and how they can be meas- 
ured. For this reason alone, any single measure of reading 
performance is limited by sampling errors (Kingston, 1965). 
While the reliability would seem to be higher when more than 
one measurement procedure is employed, there is little or no 
knowledge of how the various measures should be combined. 
The most widely accepted procedure at the present time is to 
change the raw scores from each test to some type of standard 
scores and then to combine the standard scores. 

Are both these obstacles (regression effects and the unrelia- 
bility of single measures) insurmountable? Davis (1961) ap- 
parently did not think so. He offered five procedures for evalu- 
ating reading growth on an individual basis and three for evalu- 
ating growth on a group basis. Davis discussed each of the pro- 
cedures for estimating the reading growth on an individual basis 
in terms of a hypothetical case of a student who made a gain of 
five raw score points from a pre- to post-test. In the first proce- 
dure described, the pre-test score was subtracted from the post- 
test score and compared with the probability of such a change 
occurring by chance. This chance occurrence was based on the 
standard error of measurement of the test. In Davis’ hypotheti- 
cal case, a gain of five points was not significant at the .15 level. 
The second procedure Davis discussed involved using two meas- 
ures in the pre-test and two measures in the post-test and aver- 
aging the results. Under these conditions, the hypothetical five 
point gain was considered significant because of the increased 
accuracy of measurement. In the third procedure, any number 
of pre- and post-test measures were used to increase the accu- 
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racy of measurement. Davis’ fourth and fifth procedures were 
designed to eliminate the effects of selecting a student for a 
reading improvement program on the basis of extreme scores. 
Both methods four and five compensated for the regression ef- 
fect by estimating the improvement for a student scoring at one 
extreme of the distribution. In addition, the fifth method in- 
creased the accuracy of estimation by considering the correla- 
tion of pre- and post-test scores for all the students tested. 

Davis’ three methods for estimating change on a group basis 
were based on the assumption that the students have been ran- 
domly selected. The first method essentially paralleled the first 
procedure advocated for assessing individual growth except that 
group averages for the pre- and post-test scores were sub- 
tracted. The second and third methods compensated for the re- 
gression effect by determining the correlation between pre- and 
post-test scores in the same manner that methods four and five 
did for individuals. These methods devised by Davis are an 
outstanding contribution to solving the problems of increasing 
the reliability of estimates of change and providing statistical 
techniques for removing the effects due to regression. 

Tracy and Rankin (1967) applied the residual gain statistic 
to assessing reading improvement. They pointed out that crude 
gains (the subtraction of pre-test scores from post-test scores) 
tend to underestimate the progress of superior ‘improvers’ (as 
measured by residual gain) and to overestimate the progress of 
inferior ‘improvers’ ” (1967, p. 363). The value of using resid- 
ual gain scores is that the tests that are used do not have to be 
expressed in equal intervals scales and, more importantly, the 
technique removes the regression effect from the measurement 
of improvement. The computational procedure for residual 
gain scores is relatively easy to follow: 

1] Convert both pre- and post-reading test scores to z 
scores for each student. 

2] Compute the correlation between pre- and post-test 
raw scores. 
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3] Obtain predicted post-test z scores by multiplying the 
correlation coefficient by the pre-test z score for each 
student. 

4] Subtract the predicted post-test z score from the ob- 
tained post-test z score for each student. 

The result of using this procedure is that a student who at- 
tained a residual gain score of 0 would have achieved exactly 
what was expected of him; a student with a residual gain score 
of — 1.0 would have improved approximately one standard de- 
viation above his expected progress. However, the procedure 
does have weaknesses: in order to use it, a large number of stu- 
dents must be tested and it is of no value when only one or two 
students have been administered pre- and post-tests. Another 
factor limiting the use of the residual gain statistic lies in its in- 
terpretation. Tracy and Rankin suggested a procedure for 
evaluating and changing residual gain scores to course grades; 
however, their procedure is based on the assumption that a stu- 
dent who scores very high on a pre-test may regress to some ex- 
tent on the post-test; should he regress too much, he would be 
penalized. Complications in public school situations would cer- 
tainly develop if such a student had scored far above the rest of 
the class on a pre-test and then had fallen back considerably on 
the post-test while still scoring among the top 25 per cent of the 
class. If the above procedure were used, it is possible that this 
student could receive a low grade despite his relatively strong 
standing. A thorough discussion on the reliability of residual 
change is presented in the works of Glass (1968) and Traub 
(1967). 

Considerations in estimating growth 

The research in measuring reading growth has not been sub- 
stantial, but even the knowledge afforded by research is not 
having an effect on the evaluation of reading growth. There are 
still many problems to be solved before the valid and reliable 
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measurement of reading growth will be possible. In the mean- 
time, the reading practitioner is faced with the necessity of de- 
termining the effectiveness of his reading program. The five steps 
listed below, derived from the studies cited in this chapter, 
should serve as a guide in evaluating growth. While these steps 
do not solve all of the problems of measuring change, they will 
increase the reliability and validity of present assessment prac- 
tices. 

1] The practitioner should carefully define the reading 
skill or skills being taught and select a measuring in- 
strument or several instruments that are operational 
definitions of these skills. 

2] If test norms are used for comparisons, the test user 
should be sure that the norm group matches the group 
ceing tested on all important factors related to growth 
in reading. Developing local norms is, for most pur- 
poses, the best procedure. 

3] Measurement procedures should be used under con- 
ditions as closely approximating those of the teaching 
situation as possible. If instruction has been designed 
to produce a generalization of the skills, testing should 
be done under those conditions to which this skill will 
generalize. 

4] If students have been selected for a reading program 
on the basis of their performance on the lower ex- 
tremes of test score distribution, some procedure such 
as the residual gain score should be applied to remove 
regression effects. 

5] Evaluation of change scores should be interpreted 
cautiously. The irregular growth curves of individuals 
indicate that reading improvement is uneven and that 
measurement in reading always involves some error. 

There are two areas that should be more closely examined 
by research. First, there is some contradiction concerning 
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whether to use alternate forms or to reuse the same test. 
Secondly, the problems of combining teacher evaluations with 
test evaluations should be thoroughly explored. If teacher and 
test evaluations could be combined, it would most likely lead to 
more reliable measurement. 

Measuring growth: two unique cases 

The general problems and procedures for measuring reading 
growth have now been discussed and the issues raised are appli- 
cable to the measurement of growth in all areas. However, be- 
cause of a concern evidenced by the large number of research 
studies in these two areas and because of several problems 
unique to these areas, the measurement of reading growth for 
retarded readers and the measurement of growth at the reading 
readiness level merit special consideration. 

Assessing growth for retarded readers 

Those who face the task of evaluating the reading growth for 
students in remedial reading programs face the same task en- 
countered when working with students who progress at a normal 
rate. There are, however, a number of specific problems pecu- 
liar to assessing the progress of retarded readers, the most im- 
portant of which is the selection of an appropriate test (Fisher, 
1961; Glaser, 1964). A test designed for average sixth-grade 
students is, in all likelihood, inappropriate for sixth-grade stu- 
dents who are seriously retarded in reading ability. For exam- 
ple, a sixth-grade retarded reader might obtain a third-grade 
level score by chance even if he could read only at a first- or 
second-grade level. Upon re-testing, after a semester of inten- 
sive remedial help, he might again score at the third-grade level, 
but this time the score might be an accurate index of his actual 
reading ability. Despite rather substantial reading growth, the 
test should be administered to that student. Cronbach (1960, p. 
provement in reading ability. Most standardized reading tests 
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do not indicate the score that can be achieved by chance; but 
the teacher should determine what a chance score might be and, 
if a student gets a score at or below the chance level, a different 
test should be administered to that student. Cronbach (1960, p. 
49) provides a formula for determining a chance score which 
teachers might find useful. 

Another problem area in measuring reading growth for re- 
tarded readers is the evaluation of change scores. Using dis- 
crepancies between mental age and reading age is a common 
procedure for selecting participants for remedial programs. 
Those students who evidence the most gain in bringing their 
reading ages closer to their mental ages are considered to be 
making the best progress. The weakness of this procedure is 
that the scores on intelligence tests are quite often as signifi- 
cantly improved as the reading tests after remedial reading in- 
struction. Frost (1963) found significant correlations between 
improvement in reading test scores and group intelligence test 
scores for eight to ten year olds who had been in a remedial 
reading program. He concluded that because of the high corre- 
lations between intelligence tests and reading tests, intelligence 
tests are of little use in predicting reading gain. A more serious 
point was made by Frost: when students are excluded from a 
remedial reading program because of low intelligence test 
scores, some of the students excluded could perhaps have prof- 
ited more than some students who were included. In his study, 
Frost found that 29 students who had made the greatest gains 
had the lowest intelligence test scores. Certainly, the present 
widespread practice of using intelligence test scores as a criteria 
for selecting participants for a remedial program should be care- 
fully examined. 

Woodcock (1958) attempted to resolve this problem by de- 
veloping a test designed to duplicate as nearly as possible the 
process of learning to read so that performance on the test 
might be truly indicative of a student’s ability to profit from re- 
medial instruction. Woodcock concluded that his test has pre- 
dictive v^lue in selecting cases for remedial reading instruction 
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and was of greater value than the usual procedure of selecting 
students for remedial programs on the basis of discrepancies be- 
tween reading capacity and reading achievement. However, as 
Frost indicated, there are very few comparative studies which 
have examined predictive procedures for determining success in 
remedial reading. 

Several authors have examined the most common method o 
determining growth for retarded readers — the method involves 
subtracting pre-test scores from post-test scores and uLing these 
as evidence of progress. Bliesmer (1962) has pointed out that 
an assumption underlying this procedure is that the children in 
remedial programs have been selected because they have not 
been making normal progress. He compared three methods for 
evaluating progress for retarded readers: 1] determining gains 
by the typical method of finding differences between pre- and 
post-test scores, 2] comparing remedial program gains with 
average yearly gains made by the remedial students before they 
were enrolled in the remedial program, and 3] finding differ- 
ences between reading potential and reading achievement levels 
(potential-achievement gaps) at the beginning and at the end of 
a remedial program. The improvement shown by method one 
was about equivalent to what might be expected for normal 
readers; when the change scores were compared to yearly 
gains made before the remedial program (method two), the 
gains were from one and one-half to four times greater, the 
potential-achievement gap differences did not show as signifi- 
cant improvement as the other two methods. 

Libaw, Berres, and Colman (1962) suggested a method for 
evaluating the effectiveness of remedial treatment which is very 
similar to Bliesmer’s second method. The six steps outlined by 
Libaw, Berres, and Coleman include: 1] obtaining measures 
of achievement prior to treatment, 2] computing the rate of 
learning prior to treatment, 3] extrapolating to predict 
achievement after a time interval, 4] obtaining a measure of 
achievement after treatment has been under way for an interval, 
5] comparing the predicted measure with the actual achieve- 
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ment measure, 6] computing a test of significance on the dif- 
ference between the predicted and obtained achievement meas- 
ures. There are several weak points in this procedure. The 
first involves the assumption that the measures of rate of learn- 
ing before remedial instruction are reliable. As was pointed 
out previously, the use of grade level tests for seriously retarded 
readers may lead to unreliable assessments of reading perform- 
ance. Another shortcoming is that the procedure would proba- 
bly necessitate the use of several standardized reading tests 
which have been normed on various populations. It would be 
invalid to assess growth by using one standardized test as a pre- 
test and a different one as a post-test. The reasons for this in- 
clude the differences in the skills measured by each test and the 
different populations used for norming each test. The problem 
caused by having different norming populations could be over- 
come if local norms were developed for each of the tests. 

As in all reading evaluation, another difficulty in assessing 
the growth of retarded readers is the lack of means to measure 
the long term effects of instruction. Much short-term improve- 
ment which has appeared quite significant when tests are admin- 
istered immediately following remedial instruction are found to 
be non-existent when students are tested at a later date. This 
was investigated by Shearer (1967) who compared a group of 
students who received follow-up remedial instruction with a 
group who did not. At the conclusion of the remedial program, 
the mean reading grades of the two groups were about equal; 
but the group that received follow-up instruction performed sig- 
nificantly higher at a later testing period. Children who did not 
attend the special reading classes but who had been recom- 
mended for it were also tested at follow-up testing time. These 
children scored at the same level of achievement as the group 
that had received the remedial program but did not receive 
follow-up instruction. Others, such as Balow and Blomquist 
(1965) and Preston and Yarington (1967) have studied long- 
range procedures evaluating the effects of remedial instruction. 
In these studies, however, the jobs in which subjects were em- 
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ployed were used as the criteria for determining the long-range 
success of remedial instruction. 

There are two specific problems brought up earlier under 
the discussion of general problems in assessing reading growth 
which should be reiterated here in relation to retarded readers. 
The first involves the use of test norms. Retarded readers form 
an atypical population and comparing their growth with that of a 
normed typical population is completely inappropriate. It is 
questionable that standardized tests should be used for them in 
the first place. Secondly, in planning a remedial program the 
reading teacher often develops a program which is based on the 
specific reading problems of the students in his class. This 
should place a stronger emphasis on carefully selecting a test 
based on the objectives of the instructional program for each in- 
dividual student. The best evaluation of this growth, when in- 
struction is on such an intensive and highly individual basis, is 
the student’s performance in the daily task of mastering reading 
skills. Any measurement procedures along this line would be 
highly related to the specific objectives of the instructional pro- 
gram: the norm comparisons would be quite appropriate (the 
student would be compared only with his own previous learning 
rate and to what he had learned previously) and the evaluation 
would be reliable and accurate since a larger than usual sam- 
pling of behavior could take place. There is only one assump- 
tion underlying this procedure: the teacher must be knowledge- 
able about the development of reading skills. 

Reading readiness: predicting early achievement 

Reading readiness tests pose unique problems for measure- 
ment. They are most commonly used to determine if a child 
has sufficient command of skills necessary to begin formal read- 
ing instruction. Given this function, these tests become instru- 
ments of assessing not only the child’s capabilities, but also his 
growth in these capabilities. Thus, they enable the test user to 
predict how well a given student will progress in developing his 
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reading skills. Another major use of reading readiness tests is 
diagnostic: they are used to pinpoint those skills which the stu- 
dent needs to develop further. The predictive validity of readi- 
ness tests has been studied quite extensively; however, the diag- 
nostic validity of the tests has received little attention. The 
basic conclusion from these various research efforts seem to be 
that scores on readiness tests have a fair amount of predictive 
validity (Bremer, 1959;Henig, 1949; McCall & McCall, 1965), 
but there is almost no evidence that the increased teaching of 
these skills will ensure success in learning to read (Barrett, 
1966). This is probably due to the failure of readiness tests to 
assess many of the more important habits and attitudes related 
to reading readiness. 

The tests most predictive of reading ability seem to be those 
which are most similar to the act of reading (Barrett, 1965). 
This is not surprising since a test should be a sample of the be- 
havior which it is supposed to measure; furthermore, a test 
which predicts best is one which is most like the behavior it is 
supposed to predict. The best example of this is that the con- 
sistently single best predictor of future school grades is past 
school grades. In comparing seven pre-reading tasks — recogni- 
tion of letters, matching words, discrimination of beginning 
sounds in words, discrimination of vowel sounds in words, dis- 
crimination of ending sounds in words, shape completion, and 
copy-a-sentence — with reading test scores at the end of first 
grade, Barrett (1966) found that recognition of letters was the 
best single predictor. Of the seven tasks, the recognition of let- 
ters most closely resembles actual reading. Matching of words 
also seems to possess many of the components prerequisite to 
reading; however, in this task the child does not have to name 
the shape he has visually perceived as being different, as he does 
in naming letters. As Barrett cogently pointed out, the finding 
that the naming of letters is a relatively good predictor of begin- 
ning reading is not unique and student performance on this task 
has no validated diagnostic value. He further suggested that 
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this skill is perhaps an indication of the child’s early and broad 
experiences with written materials and, therefore, “it should not 
be inferred from this study that teaching children to recognize 
letters by name will necessarily ensure success in beginning 
reading” (1966, p. 463). 

Nash (1963) also found that tests which most closely resem- 
bled reading were the most predictive. The Metropolitan 
Readiness Test, selected items from the Stanford-Binet Intelli- 
gence Test, a sociometric technique, the Draw-a-Man Test, 
Learning Rate of Words Inventory, New Gestalt Test, and the 
Maturity Level for Reading Readiness were administered to 
132 first-grade children at the beginning of the school year. 
The criterion test, the Gates Primary Reading Test, was admin- 
istered the last week in February. Nash concluded that the pre- 
dictor tests which measured specific aspects of the reading pro- 
cess were the best predictors of future reading success. 

Weiner and Feldman’s (1963) study of the Reading Prog- 
nosis Test also provides ample evidence for predictive validity 
of tests resembling reading. One of the best predictors from 
this test was a sub-test called Beginning Reading. It appears 
naive not to assume that a student’s performance on a sub-test 
called Beginning Reading is highly predictive of achievement in 
beginning reading. Actually Weiner and Feldman’s study could 
be considered a concurrent validity rather than a predictive va- 
lidity study, 

Weintraub (1967) reviewed eighteen recent studies related 
to the ability of readiness measures to predict reading achieve- 
ment. The readiness factors included in these studies were a 
numbers sub-test, a visual discrimination test, an auditory dis- 
crimination test, the Bender-Gestalt test, a test of visual-motor 
skills, the Draw-a-Man Test, a verbal fluency test, a measure of 
speech patterns, and length of attention span. The test which 
seemed to be the best single predictor of reading achievement 
according to these studies was the numbers sub-test of the Met- 
ropolitan Readiness Test. This same conclusion was reached 
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by Abbott (1963) who found that the numbers sub-test of the 
Metropolitan Readiness Test was one of the best predictors of 
reading achievement. 

The majority of the predictive validity studies, e.g., Henig 
(1949) and Mattick (1963), have indicated that the ability of 
readiness tests to predict reading achievement is better than 
chance and that this prediction can be improved by the inclu- 
sion of other measurement procedures including teacher obser- 
vations. In the conclusion of his review, Weintraub (1967, 
p. 557) pleaded for the development of more highly refined 
readiness tests: “A survey of the literature on prediction, then, 
leads us to conclude that there is an urgent need for the develop- 
ment of better measures or batteries of measures than we now 
have. This development calls for creativity on the part of re- 
searchers and reading teachers in general. New directions need 
to be investigated.” Some of the directions Weintraub suggested 
include measurement of attention span, oral language, and chil- 
dren’s self-evaluations. The importance of self-evaluations for 
children from various sub-cultures was also stressed as being an 
especially important research area. 

What are some of the factors which seem to affect the pre- 
dictive validity of readiness tests? Those that have been re- 
ported in the research literature include socio-economic status, 
sex differences, and personality differences. One that has not 
been studied at all is the effect of the instructional program in 
reading. Here again, the matching of test objectives to the ob- 
jectives of the instructional program constitutes a major prob- 
lem neglected by research. It seems logical to conclude that 
readiness tests will predict reading performance better when the 
instructional program follows the same pattern as the test. For 
example, it could be hypothesized that an auditory discrimina- 
tion test might predict reading success best when the instruc- 
tional program is heavily oriented toward a phonics approach. 
Interpretation of the predictive validity studies on reading readi- 
ness would also be improved if the authors would describe the 
content of the reading programs which intervened between the 
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readiness test as the pre-test and the reading achievement test as 
the post-test. 

The particular sub-culture from which a child comes appears 
to be an important variable in the predictive validity of readi- 
ness tests. After administering a series of individual and group 
readiness tests to 105 Negro, white, Puerto Rican, and oriental 
first-grade pupils who were considered to be culturally different, 
Loper (1965) concluded that group tests did not adequately 
measure reading readiness for children from these backgrounds. 
Conflicting findings were reported by Weiner and Feldman 
(1963) who developed a readiness test designed to measure 
language, perceptual discrimination, and beginning reading 
skills. Comparative correlations between the readiness test and 
later reading achievement for low and middle-class children 
were found to be quite similar. The correlations between Wei- 
ner and Feldman’s Reading Prognosis Test and the Paragraph 
Reading Test of the Gates Primary Reading Test for lower-class 
children was .72; for middle-class children, the correlation was 
.77. When the Sentence Reading Test of the Gates Primary 
Reading Test was the criterion test, the correlations were .68 
for lower-class children and .74 for middle-class children. 
Another study of cultural bias in readiness tests was conducted 
by Standish (1959). Standish found that predictions from 
American readiness tests, when used with children in British 
schools, frequently were inadequate because the norms estab- 
lished on American populations rarely went as low as the age 
levels required for beginners in Britain. 

The value of providing separate norms for each sex was dis- 
cussed in Chapter 1 where it was concluded that doing so was 
neither useful nor necessary for reading achievement tests. The 
same conclusion was made in regard to readiness tests by Pres- 
cott (1955) in his study comparing the performance of 7,821 
boys and 7,138 girls from 56 communities throughout the 
United States on the Metropolitan Readiness Test. Prescott 
found that the mean performance of girls was slightly superior 
to that of boys (2.14 raw score points) and that this difference 










i 




o) 







158 



Reading: what can be measured? 



was statistically significant. However, this significance was 
probably due to the large sample sizes used — the strength of the 
relationship between the readiness scores for boys and girls 
would certainly be extremely high. When the mean scores of 
average boys were compared with those of average girls and 
when the mean scores of under-age boys were compared with 
under-age girls, the mean differences in reading readiness test 
scores were not significant. 

Personality differences could also seemingly affect the pre- 
dictive validity of readiness tests. Lockhart (1965) correlated 
the Personal Adjustment and Social Adjustment sub-tests of the 
California Test of Personality with the Metropolitan Readiness 
Test. The Social Adjustment sub-test correlated with reading 
readiness tests at a negligible level, but the Personal Adjustment 
sub-test and the readiness test correlations were significant at the 
.01 level for the total group (r = .51) and for boys (r = .64). 
It would have been useful if the research had gone further and 
performed a multiple correlation analysis between these two 
tests and later reading achievement. This would have provided 
useful information concerning the added usefulness of adminis- 
tering a personality test as well as a readiness test for predicting 
reading achievement. 

A great number of tests and test procedures have been em- 
ployed as predictors of reading readiness. Included among 
these have been teacher ratings, standardized reading readiness 
tests, intelligence tests, language development tests, perceptual- 
motor tests, projective tests, and tests of auditory and visual 
discrimination. Because of the vast number of studies involving 
each of these procedures, it is impossible to note all of them. 
Therefore, only those studies which appear to be key to the de- 
velopment of the measurement of readiness skills and those 
which have been conducted more recently are covered in the 
present review. 

Two studies — by Kermonian (1962) and Henig (1949) — 
dealing with the comparative predictive validity of teacher rat- 
ings of reading readiness were discussed earlier. Both studies 
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| concluded that teacher forecasts and readiness tests are highly 

j correlated and are equally valid procedures in predicting later 

I reading achievement. Similar conclusions have been reached 

j by other researchers. In particular, Henderson and Long 

{ (1968) found that the variance in reading readiness test scores 

J is due to quite varied maturational-experiential factors which 

j- ' :J can only be explained through teacher ratings of students’ be- 

j haviors. Most of the reported studies have discovered that 

j teacher ratings are relatively good predictors when compared 

j with other measurement procedures. Zaruba (1968) found 

j. that the closest relationship between reading grade placement 

| scores given by teachers at the end of first grade was a test of 

1 j letter recognition administered at the beginning of the year; a 

j teacher evaluation given at the same time was next closest and a 

j draw-a-man test was third closest. Lack of statistical analysis, 

!j however, limited the interpretation of Zaruba’s study. Mattick 

■ (1963) compared two standardized reading readiness tests, the 

| Metropolitan Readiness Test and the Lee-Clark Reading Readi- 

! ness Test; and two standardized intelligence tests, the California 

Test of Mental Maturity and the Lorge-Thorndike Intelligence 
Test; and kindergarten teachers’ predictions with teachers’ as- 
sessments of first graders’ early success in reading. The Metro- 
politan Readiness Test was the best predictor and the kinder- 
garten teachers’ ratings were the second best predictor. Alshan 
(1965) found that the best predictors of first-grade reading, as 
measured by the Word Recognition Test of Gates Primary 
Reading Test, in order of importance as predictors of reading, 
were a test of auditory blending, the Roswell-Chall Auditory 
Blending Test; an experimental consonant combinations test; 
teachers’ ratings; an experimental test of visual discrimination; 
an experimental test of letter names and sounds; and an experi- 
mental test of oral language proficiency. Alshan factor-analyzed 
these measures and found that the teacher ratings loaded 
heavily on a single factor. This led to the conclusion that the 
teachers rated the children in a global fashion. A study which 
investigated teachers’ attitudes toward ratings was carried out 
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by Standish (1959). In research carried out in England, Stan- 
dish discovered that while good teachers report reading readi- 
ness is difficult to predict, many teachers considered prediction 
of readiness a matter of instinct. The teachers in Standish’s 
study, like their American counterparts, considered students’ 
motivation high on the list of behaviors characterizing children 
who are ready to learn how to read. 

A serious problem affecting the predictive validity of readi- 
ness tests is that the predictions seem to vary for particular pop- 
ulations. Savage (1959) suggested that some of these popula- 
tion variables include sex differences, socio-economic status, 
and the effects of practice in taking tests. There has also been 
some controversy over whether reading readiness tests are most 
useful as tests for predicting future reading achievement or 
whether their greater usefulness is in diagnosing readiness skills. 
A study by Karzen, Suvetor, and Thompson (1965) raised ser- 
ious doubt about the predictive validity of readiness tests. They 
found that the Metropolitan Achievement Tests: Reading pre- 
dicted reading achievement for first-grade children but it was 
highly correlated with the Lorge-Thorndike Intelligence test 
scores. They concluded that “children tend to perform in read- 
ing according to the level of the room in which they are placed, 
regardless of their ability as measured by conventional intelli- 
gence tests” (Karzen, Suvetor, & Thompson, 1965, p. 22). The 
implication of this finding is that the placement of children into 
groups according to their readiness scores tends to encourage 
them to achieve according to the expectation of the group in 
which they were placed. In other words, it would mean that 
placement, not readiness skill development, is the key to success 
in learning to read. Other researchers have urged caution in 
using readiness test scores to place students in reading groups. 
The relationships between the results of the Lee-Clark Reading 
Readiness Test administered at the beginning of first grade and 
the California Reading Test administered to the same pupils at 
the beginning of second grade were studied by Powell and 
Parsley (1961, Parsley & Powell, 1961). Separate correlations 
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were computed for high, average, and low reading achievement 
groups. Because the correlations for the low group were negligi- 
ble, the authors concluded that the Lee-Clark test is useful only 
as a predictor of the reading achievement for the entire group 
and should not be used diagnostically for placing children into 



reading groups. 

Contradictory conclusions were reached by Bremer (1959). 
Bremer correlated the results of the Metropolitan Readiness 
Tests given at the beginning of first grade with the reading sub- 
tests of the Gray-Votaw-Rogers General Achievement Tests 
given at the beginning of second grade for 2,069 students. A 
Pearson product-moment correlation of the test scores produced 
a correlation of .40. A coefficient of alienation of .92 was then 
computed. This, Bremer said, indicated that the readiness test 
had an index of forecasting efficiency of eight per cent. 
However, the procedure in computing this correlation of aliena- 
tion was never explained. Despite this shortcoming, Bremer s 
conclusion that the readiness tests are not very useful predictive 
instruments seems to be substantiated by his study. What is 
surprising is his conclusion that “readiness tests can be of great 
help in pointing out the deficiencies in the reading readiness of 
individual pupils” (1959, p. 224). Bremer had spent the bulk 
of his research report criticizing the readiness test for lack of 
predictive validity. Here he suggested that they can be used for 
diagnostic purposes, in spite of the fact that he failed to provide 
diagnostic or construct validity evidence to support this recom- 



mendation. 

The use of intelligence tests as predictors of beginning read- 
ing achievement has been the focus of several investigations. 
Mattick (1963) found that group intelligence tests were poorer 
predictors of first-grade reading achievement than reading read- 
iness tests or teacher ratings. A similar finding was reported by 
McCall and McCall (1965) when they compared the predictive 
validity of the Metropolitan Readiness Test (MRT) and the 
California Test of Mental Maturity (CTMM) with the Metro- 
politan Achievement Tests: Reading (MAT) and the California 
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Achievement Test (CAT). The CTMM correlated with CAT 
total score .39, while the MRT correlated with the CAT .64. 
The same situation resulted when the MAT was the criterion 
score. The CTMM correlated with the sub-tests of the MAT at 
significantly lower levels than the MRT. 

Group intelligence tests have been shown to be relatively 
poor predictors of beginning reading success, but is this also 
true for individually administered intelligence tests? 
Comparisons of the relative correlations of Metropolitan Readi- 
ness Tests (MRT) and Stanford-Binet Intelligence Scale scores 
to Metropolitan Achievement Test (MAT) scores indicated in- 
conclusive results (Weiser, 1965). Weiser reached this conclu- 
sion after reviewing studies by Hildreth and Griffiths (1933), 
Foster (1937), Wilson and Fleming (1938), Dean (1939), Keis- 
ter (1941), Gavel (1958), Mitchell (1962), and Weiser (1964). 
In Foster’s (1937) study, the Stanford-Binet scores correlated 
with Metropolitan Achievement Test scores .51 and the MRT 
correlated with the same achievement test only .38; in Wilson 
and Fleming’s (1938) study, there were opposite results with 
the Stanford-Binet scores correlating with MAT scores .51 
while MRT and MAT scores correlated .60. In Weiser’s 
(1964) study with a small sample of 24 academically superior 
students, the MRT correlated with MAT scores .58, while 
Stanford-Binet scores corelated only .13. Of some surprise in 
Weiser’s study was the additional finding that when a multiple r 
of MRT and Stanford-Binet scores were correlated with MAT 
scores, the coefficient dropped to .48. 

Parsley and Powell (1961), as mentioned earlier, also found 
that Stanford-Binet scores were relatively poor predictors of 
reading readiness. They correlated Lee-Clark Reading Readi- 
ness. Test scores with Stanford-Binet test scores for 169 first 
graders. The correlations, while positive, were fairly low — 
ranging from .35 to .48. It was concluded that intelligence test 
scores are relatively poor predictors of reading readiness. This 
would have been better substantiated if the tests had been cor- 
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related with later reading achievement rather than with a readi- 
ness test. 

The results of the Davis-Eells Test of General Intelligence 
which attempts to minimize the influence of socio-economic dif- 
ferences, were compared to Stanford-Binet I.Q. scores in terms 
of its ability to predict behavior on the Gates Primary Reading 
Test administered at the end of the first grade (Russell, 1956). 
The correlations were .57 for the Stanford-Binet and .21 for the 
Davis-Eells test. Russell (1956, p. 270) concluded that “the 
Stanford-Binet test gives a better prediction of reading progress 
during the first year’s instruction than the Davis-Eells test.” 

A large factor determining performance on intelligence tests 
is language ability. It is perhaps this aspect of intelligence tests 
which explains the partial correlation of intelligence tests with 
beginning reading. Morrison (1962) studied the relationship 
between maturity in the use of various types of sentence struc- 
ture and children’s scores on the Lee-Clark Reading Readiness 
Test. The children’s oral language was recorded during “shar- 
ing” time and was classified according to the complexity of sen- 
tence structure. The complexity of the children’s sentence 
structure correlated with the raw scores of the Lee-Clark .72, 
while the children’s ability to recall incidents in a story read to 
them orally correlated with the Lee-Clark .79. On the basis of 
this study, it is certainly apparent that additional research inves- 
tigating the predictive validity of oral language development is 
needed. 

Research on the use of perception or perceptual-motor tests 
in predicting reading achievement have been inconclusive. In 
general, though, these tests do seem to be better predictors of 
achievement than intelligence test scores. Two studies have 
shown that scores on the Bender-Gestalt are adequate predic- 
tors of first-grade reading achievement. Koppitz, Mardis, and 
Stephens (1962) found that the Bender correlated with the 
Metropolitan Readiness Test - .59 and the Lee-Clark Reading 
Readiness Test - .61. The correlations were negative because 
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the Bender is scored for errors; however, this does not affect the 
strength of relationship. The Bender was then compared to the 
two readiness tests in predicting end-of-first-grade achievement 
on the Metropolitan Achievement Test, Primary I Battery, 
Form R. The Bender was shown to be as good a predictor of 
achievement as the two readiness tests. Smith and Keogh 
(1962) supported this finding when they used the Bender- 

Gestalt on a group basis for predicting reading achievement, 
they concluded: 



. . . the group Bender-Gestalt, when rated with an author- 
developed rating scale, is an effective and useful screening 
instrument for evaluating the readiness level of children 
comparable to our sample, who are preparing to enter a 
formal reading program. (Smith & Keogh, 1962, p. 645) 

The importance of perception skills in predicting reading 
achievement was stressed by Scott (1968) who developed a se- 
nation (perception) test. Scott administered his seriation test to 
173 kindergarten children and correlated these scores with their 
reading scores on the California Achievement Test administered 
at end of second grade. On the basis of the data, Scott con- 
eluded that a child’s pre-reading capacity to process visual stim- 
uli is an important aspect of reading readiness. The correlation 
between the seriation test score and the California Achievement 
lest reading achievement score was .59. Fox (1953) also found 
that perception scores were generally adequate predictors of 
hrst-grade reading achievement. He administered a tachisto- 
scopic perception test, the Metropolitan Readiness Test the 
Row-Peterson Readiness Test, and the Kuhlmann-Anderson In- 
telligence Tests to beginning first graders. Results of these tests 
were compared at the end of first grade with the Gates Primary 
Reading Test, a teacher rating score card, an author-constructed 
oral reading comprehension test, and a rating based on number 
of books read. The conclusions, based on these correlations, 
were that the ability to perceive tachistoscopically projected im- 
ages is an important aspect of reading readiness. In addition 
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Fox found that the tachistoscope test was slightly superior to the 
two readiness tests in predicting success in reading. 

Robinson, Mozzi, Wittick, and Rosenbloom (1960) in a 
three-year longitudinal study found that the Children’s Percep- 
tual Achievement Test has only slight relationship to reading 
achievement in first grade. Howe (1963) also found that the 
author-constructed Visual Fusion Threshold Test was only 
moderately related to reading readiness and early reading tasks. 
He concluded that this test accounted only for a physical factor 
as one aspect of reading readiness and that the test bears no 
relation to the intellectual aspects of reading. 

At least two studies (Meyer, 1953; Ames & Walker, 1964) 
have attempted to use the results of a projective test, the Ror- 
schach, as a predictor of later reading achievement. Meyer’s 
study revealed significant differences in the use of the diverse 
Rorschach variables between achieving and retarded readers at 
the beginning of third grade (the Rorschach tests had been 
given to these same children at the kindergarten level). The 
retarded readers were “unable to differentiate in their percep- 
tual experiences beyond rather inaccurate, vague, and mediocre 
global perceptions” (1953, p. 423). Meyer concluded: 

. . . kindergarten Rorschach records may not only be used 
as prognostic tests of reading achievement in the primary 
grades, but may also be used to provide data on first grade 
reading readiness, particularly in the areas of intellectual 
and emotional readiness. (1953, pp. 424-25) 

Ames and Walker (1964) correlated Rorschach scores and 
Wechsler Intelligence Scale for Children (WISC) scores admin- 
istered in kindergarten with the reading test scores of the Stan- 
ford Achievement Test: Reading Test administered to the same 
children at the end of fifth grade. The WISC correlated .57 
with the reading achievement scores and the Rorschach corre- 
lated .53 with the reading achievement scores. The relatively 
long time period between the administration of the predictor 
tests (the Rorschach and the WISC) and the criterion test 
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makes these correlations even more significant. Of even greater 
importance is the multiple correlation (.73) between the WISC 
and Rorschach scores with fifth-grade reading achievement. 

While visual discrimination tests have been found to be ade- 
quate predictors of reading achievement, auditory discrimina- 
tion tests have been found to be less predictive. Shea (1964) 
found the Visual Discrimination Word Test: Schonell Reading 
Test administered at the beginning of first grade was superior 
to the Metropolitan Readiness Test in predicting word recogni- 
tion ability after five months of formal instruction in reading. 

Dykstra (1962) found that the use of auditory discrimina- 
tion tests improved very little on the prediction of reading 
achievement afforded by intelligence test scores. A similar find- 
ing was reported by Thompson (1963) who concluded that au- 
ditory discrimination and intelligence are highly intercorrelated 
and that each is about equally predictive of success in primary 
reading. 

Most research dealing with the validity of various tests for 
predicting success in beginning reading have covered only short 
time spans. In such studies, the predidors are usually adminis- 
tered at the end of kindergarten or the beginning of first grade 
and the criteria tests are administered at the end of first grade or 
the beginning of second grade. In one long-term study, Moreau 
(1950) found that predictions of reading achievement based on 
test scores on the Pintner-Cunningham Primary Test and on the 
Lee-Clark Reading Readiness Test were almost as reliable in 
predicting sixth-grade achievement on the California Basic 
Skills Test as they were for predicting first-grade achievement. 
Moreau cautioned against using the tests for individual predic- 
tion but suggested that both tests could be used as screening de- 
vices. Sutton (1960) examined the variation in reading 
achievement over a seven-year period for children who scored 
high on the Metropolitan Reading Readiness Test in kindergar- 
ten. Sutton concluded that of the 210 five-year-olds who 
scored at this high level, only eight were not by the sixth grade 
working up to capacity predicted by the test. 
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Using readiness tests The major conclusion from this review of 
the various attempts to measure reading readiness is that readi- 
ness tests generally have positive, but fairly low correlations with 
later reading achievement. While readiness tests have been gen- 
erally the best predictors of achievement, teacher forecasts, tests 
of perception, and measures of language ability all appear to be 
somewhat valid predictors. Intelligence tests, probably because 
of their high correlations with these various measures, do not 
seem to add to the predictive validity of readiness tests. One of 
the most important shortcomings of the predictive studies of 
readiness tests is that the researchers usually fail to describe the 
initial reading program. Until this is done, the predictive validity 
of readiness tests will remain an unanswered question. The evi- 
dence regarding sub-cultural differences in predicting reading 
achievement are not very conclusive. On the other hand, there 
is fairly substantial evidence that sex differences do affect the 
predictive validity of reading readiness tests, but this is probably 
caused by the uniformity in beginning reading instruction for 
both boys and girls. 

The lack of studies relating the predictions of readiness 
measures to the types of subsequent instructional programs limits 
the conclusions of the many researchers who indicated that 
readiness tests can be used diagnostically. Future research 
needs to focus on investigations in which the readiness test 
scores are used to provide information concerning the need for 
the development of specific readiness skills. Under ideal condi- 
tions, the eventual correlations of these readiness test scores and 
later reading achievement would be reduced to near zero be- 
cause the effect of instruction would be designed to strengthen 
specific weaknesses and, therefore, lower the correlations. Of 
course, this direction for future study is based on the assump- 
tion that the readiness skills measured by present standardized 
readiness tests are the key variables related to reading achieve- 
ment. The increased power of predictions when various tests of 
personality (Rorschach) or perception (Bender-Gestalt) are 
added to readiness test scores indicates that there are several 
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important predictors which have yet to be examined. In addi- 
tion, the interactions of these factors need to be examined. Yet 
another area in which research might well prove fruitful is that 
of evaluating long-term prediction of reading readiness tests, 
especially if evidence can be found to support the contention 
that the continued teaching of readiness skills is important to 
later reading success. The search for a non-verbal intelligence 
test which will be useful in predicting achievement for beginning 
reading is probably not a productive area for research because 
of the strong correlations between language factors, intelligence 
tests, and tests of beginning reading. Finally, in regard to pre- 
dicting achievement for beginning reading, a discrimination 
should be made between long-term and short-term prediction. 
It is probable that while naming and recognizing letters of the 
alphabet are good short-term predictors, experiential back- 
ground, motivation, and oral language development may per- 
haps be better long-term predictors. 

Because of the need for additional research, any suggestions 
for using readiness tests are of necessity tentative. However, 
there seems to be enough evidence to warrant the following 
procedures for selecting and using readiness tests: 



1] 



2 ] 



3 ] 



Select a readiness test which measures the necessary 
prerequisite skills to learning to read for the particu- 
lar reading program that is to follow the readiness 
testing. 

Develop local norms, both classroom and school, for 
predicting growth. 

Use teacher judgments, skills check lists, and readi- 
ness tests to increase the validity and reliability of 
judgments. 



A final word on the predictive validity of reading readiness 
tests may indicate the importance of using these tests for diag- 
nosis rather than prediction. If a readiness test is a perfect pre- 
dictor, that is if the students who score high on the readiness 
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test also score high on the reading achievement criteria test and 
vice versa for the poor readers, there may be something wrong 
with the reading program. For example, if a student scores low 
on a readiness test and that test is measuring his development in 
the skills necessary to learn to read, the instructional program 
should be designed to invalidate the prediction of the readiness 
test. 
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Measurement of reading-related variables 



Previous chapters of this monograph have focused almost 
exclusively on research dealing with specific aspects of reading 
behavior and their measurement. Rarely has research been 
mentioned which has involved the measurement of factors not 
considered part of the reading act, except in terms of whether 
sub-tests of intelligence validly measure behaviors related to 
reading (Chapter 3). However, research has shown that meas- 
ures of psychological variables, including intelligence, and 
physiological variables do influence the measurement of read- 
ing. How great this influence is is still not known nor is it clear 
what implications it has for either the teaching or diagnosis of 
reading achievement. Since the relationship does exist and 
since reading specialists have begun to use information gained 
from intelligence tests and psychological anjdj fchysiolog ical meas- 
ures in assessing reading achievement, no monograph on meas- 
urement and evaluation in reading would be complete without 
considering these measures and discussing their possible contri- 
bution to evaluating reading abilities. The review of research 
contained within this chapter is not comprehensive. It does not 
discuss the intrinsic values of measures of intelligence, psycho- 
logical, and physiological variables; rather it concentrates only 
on the relevance their measurement has to evaluating reading 
achievement. The chapter is organized so that intelligence is 
discussed apart from other psychological measures. While in- 
telligence is a psychological variable, intelligence tests are so ex- 
tensively used to assess reading capacity that they merit special 
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attention. Psychological factors such as social maturity, test 
anxiety, and motivation to achieve are discussed separately. 
Finally, the chapter ends with a glance at the importance of 
physiological measures such as lateral dominance and visual and 
auditory- perception in analyzing reading performance. 



Relationship between intelligence and reading 

The relationship between measures of intelligence and meas- 
ures of reading achievement has long been a source of contro- 
versy. The controversy does not arise over whether intelligence 
is related to reading or whether intelligence test scores correlate 
with reading test scores (which research has shown). Rather, 
the controversy arises out of the explanation given for that rela- 
tionship: are the correlations between reading and intelligence 
test scores due to a similarity between the two? If this is the 
case, could intelligence tests be substituted for reading tests? 
Or, are the correlations caused by a dependence on the part of 
intelligence tests on reading achievement to such an extent that 
the two are indistinguishable? In short, are intelligence tests in 
fact measuring an acquired skill (reading) rather than an innate 
capacity (intelligence)? 

Earlier sections of this monograph have considered the value 
of student performance on sub-tests of intelligence in relation to 
assessing reading achievement (Chapter 3). There it was con- 
cluded that sub-test performance contributed little to evaluating 
reading achievement. Again in Chapter 4, intelligence tests 
were considered in terms of their helpfulness in predicting 
achievement at the readiness level. It was found that while in- 
telligence tests are useful in assessing reading ability and in pre- 
dicting long-term achievement, other factors such as the child s 
language development, self-concepts, experiential background, 
and beginning knowledge of reading are more closely related to 
immediate success in learning to read. This is not to say that 
intelligence does not underlie language development or initial 
reading skill development, it is merely to point out that intelli- 
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gence is not the only factor in learning to read and it may not be 
the most important one. 

Probably the key to understanding the nature of the relation- 
ship between reading test performance and intelligence test per- 
formance is in studying the variables which affect the correla- 
tion between the two. For instance, research has shown that 
achievement on reading tests is more highly correlated with ver- 
bal intelligence tests scores than it is with non-verbal intelli- 
gence test scores. Hage and Stroud (1959) reported this in 
their study using the verbal and non-verbal sub-tests of the 
Lorge-Thorndike Intelligence Test, the Pressey Diagnostic 
Reading Tests, and the Iowa Every-Pupil Tests of Basic Skills. 
Triggs, Cantee, Binks, Foster, and Adams (1954) reported 
similar findings when the Wechsler Intelligence Scale for Chil- 
dren was used as the intelligence measure. 

Another variable affecting the correlation between intelli- 
gence and reading test scores appears to be the ages of the sub- 
jects tested (Lennon, 1950; Triggs, Cantee, Binks, Foster, & 
Adams, 1954; Gates, 1921). As chronological age increases, 
the correlations between intelligence and reading increase. 
This can be accounted for, in part, by the fact that at the higher 
levels, those who are still in school are either brighter and/or 
better readers: those who are bright but could not master read- 
ing have dropped out of school, especially by the college level. 
Thus, despite the fact that the relation between reading and in- 
telligence test scores is quite high at the college level, it is still 
fallacious to interpret this as an identity between intelligence 
and reading achievement. Even correlations of .80 leave ap- 
proximately 36 per cent of the variance unaccounted for. 

The correlation between reading and intelligence test scores 
has been found to be influenced by opportunity to learn. This 
was demonstrated by Wheeler’s (1949) study in which scores 
on reading, intelligence, and general academic achievement tests 
were analyzed for Negro and white children in Tennessee. 
While intelligence test scores of Negro children were 9 per cent 
lower than those of the white children, their reading test scores 
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were 28 per cent lower and their general academic achievement 
was 65 per cent lower. This meant, of course, that the correla- 
tion between reading test scores and intelligence test scores was 
significantly lower for Negro children than for white children. 

Another approach to understanding the nature of the corre- 
lation between performance on reading tests and intelligence 
tests is in the effect that reading has on intelligence test per- 
formance. Research has strongly suggested that reading is key 
to performance on intelligence tests and tests of general aca- 
demic achievement (Durrell, 1933; Jones, 1953; Fitzgerald, 
1960). This is especially true of group intelligence tests which 
are primarily paper and pencil tests requiring a great deal of 
reading. It is fairly well known that poor readers do not per- 
form as well on group intelligence tests as do good readers. 
For instance, Durrell (1933) found that sixth-graders were pe- 
nalized on group intelligence tests in direct proportion to their 
degree of reading retardation. 

The strong influence reading has on intelligence test per- 
formance was emphasized by Jones (1953) who compared a 
group of monoglot English children with 51 Welsh bilingual 
children who had learned English as a second language. Group 
non-verbal and verbal tests of intelligence as well as a silent 
reading test in English were given to both groups. The non- 
verbal test was administered in Welsh to the bilingual group and 
in English to the monoglot group. The other tests were given in 
English to both groups. No differences between the two groups 
were found in the means and variances on the non-verbal tests, 
but there was a significant difference in favor of the monoglot 
group on both the verbal intelligence test and the silent reading 
test. 

Underscoring the point made by Jones’ study, Fitzgerald 
(1960) used WISC scores as measures of the true intelligence 
for students in grades four, five, and six. He then examined the 
effects of reading on group intelligence tests by administering 
the Gates Reading Survey and the" verbal battery of the Lorge- 
Thorndike Intelligence Test. Fitzgerald discovered that chil- 
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dren retarded in reading scored even lower on group intelli- 
gence tests than their achievement levels, as indicated by the in- 
dividual intelligence test scores, would have led one to believe. A 
similar finding was reported by Plattor, Plattor, Sherwood, and 
Sherwood (1959) with junior high students. Having compared 
the performance of average readers and retarded readers on the 
Pintner Verbal Tests of General Ability and the Pintner Non- 
t Language Primary Mental Test, they concluded that reading dis- 
ability invalidated the Pintner Verbal test as a measure of intel- 
ligence. 

Neville (1965) attempted to find out not only if reading 
achievement negatively influences group verbal intelligence test 
scores, but also the level at which the inability to read affects 
group verbal intelligence test performance to such an extent as 
to invalidate their use. The measure of reading achievement 
used by Neville was the Metropolitan Achievement Tests: 
Reading, the group verbal intelligence test used was the verbal 
battery of Lorge-Thorndike Intelligence Tests, Form A, and the 
individual intelligence tests used as criteria tests were the 
Wechsler Intelligence Scale for Children and the Peabody Pic- 
ture Vocabulary Test. Neville did find that poor reading nega- 
tively affects group verbal intelligence test scores. As to 
whether the inability to read invalidates the use of group verbal 
intelligence test scores, Neville (1965, p. 260) concluded that 
“a grade 4.0 achievement level in reading is a critical minimum 
for obtaining reasonably valid I.Q.’s for children in intermediate 
grades.” One limitation of Neville’s study is that the grade 
score of 4.0 would not represent the same level of performance 
on all reading achievement tests. If a researcher or teacher 
wanted to determine such a minimum level for a particular test, 
he would have to replicate the pertinent aspects of Neville’s ex- 
periment, using the tests the teacher himself chooses with the 
particular students whose intelligence he is concerned with as- 
sessing. 

Grade level differentes seem important in determining the 
effect of reading achievement on intelligence test performance. 
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For instance, Shein (1961) demonstrated that with college stu- 
dents removing the effect of reading comprehension from group 
intelligence test scores did not affect the correlation of group in- 
telligence tests with individual intelligence tests. 

The question of whether reading ability influences perform- 
ance on individual intelligence tests has received relatively little 
attention because individual intelligence tests are usually used as 
criterion measures for assessing group intelligence test validity. 
Researchers have not developed other criterion measures which 
could be used to measure true intelligence and serve as a basis 
for assessing the individual intelligence tests themselves. 
Tanyzer (1962) developed an unusual procedure to overcome 
this problem. He hypothesized that the average gain per month 
in reading achievement would be significantly related to im- 
provement on the Wechsler Intelligence Scale for Children. This 
hypothesis was rejected when Tanyzer found that significant im- 
provement in reading achievement for retarded readers had lit- 
tle effect on change in WISC scores. 

Some researchers have focused their attention on the effects 
of reading ability on sub-tests scores, rather than total test 
scores, on individual intelligence tests. There is some evidence 
that poor readers are penalized on certain sub-tests of the WISC < 
such as Information, Arithmetic, Digit Span, Coding, and Vo- 
cabulary. In several studies reviewed in Chapter 3 (Coleman & 
Rasof, 1963; Graham, 1952; Muir, 1962; Dockrell, 1960), it 
was found that poor readers tended to perform poorly on those 
sub-tests which would seem to Involve reading; these same stu- 
dents had scored higher on non-verbal tests. The research- 
ers in most of these studies had attempted to compare WISC 
performance only for students of equal intelligence. However, 
because of the possible penalty imposed if the student was defi- 
cient in reading, average readers of average intelligence may 
have ended up being compared with poor readers of above av- 
erage intelligence who scored at an average intelligence level 
because of poor performance . on those sub-tests affected by 
reading. The study carried out by Bond and Fay (1950) also 
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collaborated this: they discovered that poor readers are penal- 
ized on the verbal items on the Stanford-Binet Intelligence 
Scale. 

A different approach to exploring the relation between meas- 
ured intelligence and reading was undertaken by Bliesmer 
(1954). Bliesmer compared the reading abilities of “bright” 
children and “dull” children having the same mental ages. The 
group of “bright” children consisted of 28 third and fourth 
graders who had Stanford-Binet scores of 116 or above; the 
dull group consisted of 28 eighth and ninth graders with 
Stanford-Binet scores of 84 or below. Both groups had com- 
parable ranges of mental ages; the mean mental age for both 
groups was 11.3. In terms of reading achievement, the children 
in the “bright” group were superior in total reading comprehen- 
sion, locating or recognizing factual details, recognizing main 
ideas, drawing inferences and conclusions, memory of factual de- 
tails, and perception of relationships among definitely stated 
ideas. Bright children and “dull” children appeared to be 
comparable in reading rate, word recognition, and word mean- 
ing. From Bliesmer’s study, it is apparent that individual in- 
telligence test performance is only one factor affecting reading 
test performance. Even when comparing children of the same 
mental ages, factors such as cronological age, amount of educa- 
tion, experiential background, motivation, and self-concept af- 
fect reading test performance. 

If poor reading negatively affects performance on group and 
perhaps individual intelligence tests, it can also be hypothesized 
that poor reading will negatively affect performance on other 
achievement and aptitude tests. Johnson and Bond (1950) ex- 
amined the readability of ten vocational aptitude and personal- 
ity tests and five group intelligence tests. They concluded that 
many of the tests were too difficult for the level at which their 
use was recommended. They pointed out that if the reading 
achievement of students in those grades where these tests are 
used are below average for that grade, the tests are probably not 
validly measuring the skills or other factors that they were in- 
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tended to measure. Johnson and Bond also noted the varia- 
tions in readability among test directions as well as test items. 

Another approach to the study of the influence of reading on 
achievement test scores is to administer the same test both as a 
listening test and a reading test and examine the differences in 
performance. Lundsteen (1966) found that the correlation be- 
tween a problem-listening test and a problem-reading test was 
only .39 for a group of sixth-grade students. Lundsteen did not 
indicate if the poor readers scored better on the listening than 
the reading test. Contradictory results were reported by West- 
over (1958) who compared the achievement of a group of col- 
lege students on listening tests to reading tests. No mean dif- 
ferences were found in test performance or in student prefer- 
ence for one type of test over another. However, Westover re- 
ported that some students consistently performed better on one 
type of test. 

Studies of the effects of reading on achievement and intelli- 
gence test performance indicate that the degree of the effect de- 
pends on the test used as the criteria of “true” performance. 
The use of individual intelligence tests as measures of “true” in- 
telligence level is restricted because they may be affected by 
reading ability. Any studies which use listening test perform- 
ance as the “true” level of ability penalize the poor reader be- 
cause he has had only limited opportunities to gain through 
reading the knowledge required by the test. In addition, there 
is evidence to indicate that as a student progresses through 
school, reading achievement becomes less and less a factor in 
test performance because most students tend to achieve the 
minimum levels of reading ability necessary to comprehend 
most aptitude or achievement tests or they leave school; those 
who do not, the poorest readers, are often pushed out of schools 
(Penty, 1956). Therefore, at the upper high school and col- 
lege levels, the poorest readers are not included in studies which 
examine the influence of poor reading on test performance. 
However, when the effects of reading achievement are removed 
from these tests, they become poor predictors of future aca- 
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demic success. Because of the great amount of learning which 
takes place through reading in the schools, it is important to re- 
member that poor reading not only penalizes achievement test 
performance, but it also affects performance in most school 
learning activities. 

The use of intelligence tests to estimate reading achievement 

The research reviewed on the relationship between perform- 
ance on intelligence and on reading tests has led to the conclu- 
sion that reading negatively affects performance on intelligence 
tests. This finding raises problems for the remedial reading 
teacher who is accustomed to labelling as “remedial reading 
cases,” those students whose reading capacity as measured by 
intelligence tests is found to exceed achievement on reading 
tests. If the performance on intelligence tests upon which the 
evaluation of the student’s capacity is based has been distorted 
by reading disability, the student may in fact be a remedial 
reading case even if the intelligence measure would seem to in- 
dicate that he is reading up to capacity — in simple terms, his ca- 
pacity has been misestimated by a test adversely affected by 
reading disability. There have been a variety of attempts to 
overcome this problem, some of which have used intelligence 
tests which do not involve reading and which are supposedly 
culture-free. 

Neville (1965) compared performance on an individually- 
administered non-reading intelligence test — the Peabody Picture 
Vocabulary Test (PPVT) — to scores on the WISC for good, 
average, and poor readers. The correlations were .42, .65, and 
.66, respectively. On this basis, Neville tentatively suggested 
that the PPVT could be substituted for the WISC with poor 
readers. However, the correlation of .66 still leaves over fifty 
per cent of the variance in the WISC test unaccounted for and 
would make any substitution of one test for the other very tenu- 
ous. Ivanoff and Tempero (1965) did not recommend the 
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PPVT for use with poor readers. Using a normal seventh- 
grade population, they examined the correlations of the PPVT 
with two group tests of mental ability — Henmon-Nelson Test of 
Mental Ability and the California Test of Mental Maturity 
(CTMM). The correlations of the PPVT with the language 
sub-tests of the CTMM were significantly higher (.82) than 
they were with the non-language sub-tests (.59). This seems 
to raise serious questions about the use of the PPVT with the 
poor readers because of its apparent reliance on language func- 
tions, Another picture vocabulary test recommended for use 
with poor readers is The Quick Test. Otto and McMenemy 
(1965) found significant but low correlations between The 
Quick Test and WISC scores with retarded readers. However, 
Otto and McMenemy concluded that while The Quick Test 
should not be considered a substitute for the WISC, “it appears 
to be a worthwhile device for use by remedial reading teachers — 
even with minimal training — in obtaining quick estimates of 
poor readers’ I.Q.’s (1965, p. 197). It should be noted that no 
evidence regarding the usefulness of The Quick Test as a ca- 
pacity measure was provided by the authors; the only evidence 
supplied was that the test correlated positively but at a low 
level with WISC scores. 

Culture-free tests have also been suggested for use with re- 
tarded readers, many of whom have different cultural back- 
grounds. Justman and Aronow (1955) studied whether the 
Davis-Eells Test of General Intelligence in Problem Solving 
Ability, an intelligence test which does not require reading abil- 
ity and is also designed to be free of cultural bias, was a more 
satisfactory measure of intelligence of poor readers than the 
Pintner Intermediate Test. It was hypothesized that poor read- 

! ers might achieve higher I.Q.’s on the Davis-Eells test because 

of its limited reading demands. The results indicated that the 
two tests produced quite comparable results. For sixth-grade 
students with reading grades below 4.0, the mean score for the 
Davis-Eells test was only 4.5 raw score points higher than the 
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mean Pintner I.Q. For those students with reading grades be- 
tween 4.0 and 4.9, the mean of the Davis-Eells test was only 2.5 
raw score points higher. 

A usual procedure in using intelligence tests scores to deter- 
mine reading potential is to subtract a student’s mental grade 
from his reading grade. Typical of these procedures is one de- 
scribed by Ravenette (1961). To identify retarded readers, 
Ravenette used students’ ability to define words orally as a ca- 
pacity measure. The weakness of this procedure is that many 
children who are poor readers have limited oral vocabularies, 
although they may have the potential to learn to read. Included 
among such cases would be students with limited or different 
experiential backgrounds (such as foreign-born students) and 
those with hearing deficiencies. Several authors have suggested 
that years in school should also be part of any formula for esti- 
mating reading capacity (Bond & Tinker, 1967; Winkley, 
1962). Another method was advocated by Woodbury (1963) 
who developed a differential index for identifying poor readers. 
Woodbury’s index takes into account the inequality of distances 
between raw scores points, the variances of aptitude and 
achievement tests, and the correlation of one test with another. 
While Woodbury’s index may be somewhat complex to apply, it 
certainly demonstrates the weakness of using the typical age to 

age or grade to grade comparisons in identifying retarded 
readers. 

A system of computing multiple regression equations for 
predicting reading age from chronological age and WISC verbal 
scores was devised by Fransella and Gerver (1965). Their 
system is limited to the Schonell Reading Tests and to the atypi- 
cal population of British children used in their study; however, 
employing the multiple regression equation does allow the re- 
searcher to take into account several factors and assign them 
appropriate emphasis in predicting reading potential. Bliesmer 
(1956) compared four tests for determining capacity levels for 
retarded readers. The measures included the Stanford-Binet, 
Kuhlmann-Anderson Intelligence Test, California Short-Form 
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Test of Mental Maturity, and the Durrell-Sullivan Reading Ca- 
pacity Test. The tests were administered to eighty retarded 
readers in grades four through seven who were enrolled in a 
reading clinic. Bliesmer reached three conclusions: 1] the 
Durrell-Sullivan test provides the highest estimate of reading 
capacity, followed by the Stanford-Binet; 2] utilizing median 
values of Kuhlmann-Anderson, Durrell-Sullivan, and California 
total scores does not aid greatly in approximating Stanford- 
Bmet estimates; and 3] none of the group tests yield estimates 
which are adequate approximations of Stanford-Binet estimates 
The various procedures for determining reading capacity 
which have been developed are quite limited in their usefulness 
because the research on which they are based has failed to re- 
late the estimates of capacity to actual reading improvement 
programs. If a measure of capacity is valid, then a student in a 
remedial reading program with only a small gap between his 
achievement and capacity scores would be expected not to make 
any reading gains other than those expected in normal develop- 
ment. On the other hand, students with a large discrepancy be- 
tween achievement and capacity would be expected to make 
much greater gains. Studies along this line should supply evi- 
dence concerning the “usefulness” of capacity measures. 

Until research does develop some accurate efficient means 
tor assessing capacity, what is the practitioner to do? For one 
thing, research does tend to indicate that if one were to predict 
reading potential, a variety of measures, rather than one kind of 
device !s more accurate. However, it is important to bear in 
mind that the estimates of accuracy of potential measures al- 

cTterinn WayS J e n °\T intdli S ence test as the validity 

criterion. McDonald suggested that the best estimates of po- 
tential include perceptive observations by classroom teachers: 



Pj®, W ' ,hn ^ ness °l teacher s to leave the “safe, familiar” 
laid rirf °!? “1^®” tests to undertake the greater challenge 
■ . W " y mgston of making tentative assessments in- 

ihnn2 ° bserVat, ° n and stud y of the background, nature of 
thought processes, personality structure and other attributes 
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acquired in the course of living is a promising sign. (1964, 

p. 118) 






The use of other psychological measures in 
assessing reading achievement 

Up to this point, research concerned only with the relation 
between measures of reading and measures of one psychological 
factor, intelligence, has been discussed. But there are other 
psychological factors whose measurement may or may not con- 
tribute to assessing reading skills. These include assessments of 
social matuiuy, impulsivity and compulsion, interpersonal skills, 
test anxiety, and motivation to achieve. There is little research 
evidence concerning the best way to use information gathered 
from the measurement of such factors. The most research has 
indicated is that these factors are somewhat related to reading 
achievement and, as yet, there is little information on how they 
affect reading. A complete review of research on problems and 
procedures in measuring all psychological variables thought to 
be related to reading is beyond the scope of this monograph. 
Those studies reviewed here are only those which have at- 
tempted to develop new methods for measuring some of the 
variables related to reading and/or which have compared meth- 
ods for measuring them. 

Although the relationship between reading and personality 
seems to be well established (Strang, McCullough, & Traxler, 
1961; Wiksell, 1948), the attempts to determine a relationship 
between personality patterns and reading achievement have 
been inconsistent (Holmes, 1961). Tabarlet (1958) studied 
whether the Mental Health Analysis would differentiate poor 
readers from average readers at the fifth-grade level. Immature 
behavior, lack of interpersonal skills, and failure to participate 
in social affairs were found to be characteristic of poor readers. 
Joseph and McDonald (1964) correlated performance on the 
Edwards Personal Preference Schedule with scores on the Diag- 
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nostic Reading Test for 1,475 college freshmen and concluded 
that good readers scored higher on such personality factors as 
the need to achieve and the need for change and affiliation while 
poor readers exhibited greater aggression, order, and abasement 
needs. 

Robinson (1953) reviewed a number of studies delineating 
procedures for relating measures of personality to measures of 
reading under three main categories, according to the following 
measurement procedures: 1] informal observations or rating 
scales, 2] psychiatric evaluations, and 3] projective tests. 
Research in the field, Robinson noted, has been inconclusive, 
perhaps because: 

. . . first, different concepts of what constitutes reading 
may be held; second, divergent theories of learning place 
different emphases on the role of personal adjustment in- 
learning to read; and finally divergent theories of personal- 
ity stress varying parameters, appraised and interpreted in 
different ways. (1953, p. 98) 

If group standardized personality tests are used as indicators 
of personality, several problems arise. It is always possible that 
some examinees are unable to read the questions because of 
poor reading and, therefore, are invalidly assessed on the par- 
ticular trait being studied. Substantiating this point, Hanes 
(1953) found that the Minnesota Multiphasic Personality In- 
ventory (MMPI) communicates different amounts of and not 
necessarily the identical information to poor readers. 

Projective tests for assessing personality characteristics have 
also been correlated with reading achievement in several stud- 
ies. Zimmerman and Allebrand (1965) compared scores on 
the Thematic Apperception Test (TAT) and the California 
Test of Personality to performance on two reading achievement 
tests, the California Reading Test and the Wide Range Reading 
Achievement Test. The results indicate significant differences 
in TAT and California Test of Personality results for good read- 
ers when compared to poor readers. Spache (1954) found the 
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Rosenzweig Picture Frustration Test to be a valid measure for 
differentiating the personality characteristics of good versus 
poor readers. 

The use of these projective tests for aiding in the diagnosis 
of retarded readers is limited by several problems. First, while 
there seems to be some evidence that projective measures are 
able to distinguish groups of good and poor readers for certain 
personality characteristics, most of the studies (Zimmerman & 
Allebrand, 1965; Spache, 1954) fail to indicate any validity evi- 
dence for the instructional use of the test results. In addition, 
the projective tests are limited by a lack of precise criteria for 
interpreting examinee responses. Reliabilities for most projec- 
tive tests have been shown to be quite inconsistent for different 
examiners and even for the same examiners in test, re-test situa- 
tions (Cronbach, 1960, Chapter 19). Because of the need for 
sophisticated interpretation, most of the projective tests should 
be administered and interpreted only by specially trained per- 
sonnel. A final problem in the use of projective techniques is 
the amount of rapport necessary to achieve valid results. The 
examinee should know what the test is attempting to measure 
and he should have confidence that accurate responses from him 
will provide a reliable measure of his status on the particular 
trait and that this information will aid in planning a useful edu- 
cational program for him. 

The predictive validity of personality tests for forecasting 
reading improvement has been studied by several researchers. 
For one, Kagan (1965) found that measures of reflection- 
impulsivity gathered in the first grade were predictive of reading 
improvement one year later. In general, children classified as 
impulsive in the first grade had the highest reading error scores 
at the end of second grade. 

Pre- and post-test reading improvement scores were pre- 
dicted with the Brown-Holtzman Survey of Study Habits and 
Attitudes, the American Council on Education Psychological 
Examination for College Freshmen (a verbal intelligence test), 
and the hysteria and psychasthenia scores of the MMPI (Chan- 
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sky & Bregman, 1957). Of the four predictors, the psychas- 
thenia score of the MMPI, a measure of the examinees’ obses- 
sions, compulsions, and phobias, was the best predictor of read- 
ing improvement. It should be noted that the correlation was 
negative, but this, of course, did not affect the predictive valid- 
ity. Neville, Pfost, and Dobbs (1967) found that high scores 
on the Test Anxiety Scale for Children were inversely related to 
improvement in reading comprehension, but not to vocabulary 
gain. Their subjects were 54 boys, seven to fourteen years of 
age, who were enrolled in a summer remedial reading program. 

The effect of personality variables on the reliability and val- 
idity of reading test scores has received some attention by re- 
searchers. Rankin (1963) found that for a group of college 
students, reading test reliability and validity indexes were higher 
for introverted than for extroverted examinees. Chansky 
(1964) found that two validity scores of the MMPI correlated 
quite high with reading achievement for a group of 56 college 
freshmen. He suggested that this correlation indicated “that 
diagnoses of reading behavior based on standardized tests may 
be in error unless carelessness and test-taking attitudes are con- 
trolled” (Chansky, 1964, p. 90). A study investigating reader 
attitudes toward topics on a comprehension test was undertaken 
with a group of high school students by McKillop (1952). 
McKillop found that for reading comprehension questions of 
specific fact and detail, the relationship of test performance to 
students’ attitudes regarding the topic was negligible. However, 
on questions of judgment, evaluation, and prediction, the rela- 
tion was significant. 

The general conclusion from these studies is that personality 
tests are valid measures for distinguishing good readers from 
poor readers on certain personality characteristics. Good read- 
ers seem to possess a higher need to achieve, higher anxiety, 
and compulsion; poor readers tend to interact less often with 
others and exhibit immature behaviors. The major limitations 
of the measures of personality include the reading difficulty of 
group tests, the interpretation of projective tests, and the rap- 
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port needed between examiners and examinees for adequate 
assessment. The validity of personality tests for assessing indi- 
viduals is quite inconclusive. There also appears to be some in- 
dication that personality variables may be affecting the reading 
test performance of students. Finally, one of the most consist- 
ent findings is that certain personality tests seem to be valid pre- 
dictors of reading growth, but much more research is needed on 
this topic. Robinson (1953, p. 98) stated that no conclusive 
relationships between personality variables and reading achieve- 
ment have been established and indicated that future research, 
“if carefully organized and controlled, may identify the most ac- 
ceptable measures of personality and reading, and may clarify 
these controversial issues.” 



The use of physiological measures to 
estimate reading capacity 

Measures of physiological variables such as visual and audi- 
tory acuity, visual and auditory perception, eye movements, lat- 
eral dominance, blood cell hemoglobin counts, muscle tension, 
and kinesthetic recognition have sometimes been shown to 
make a valuable contribution to the assessment of reading capa- 
bilities. However, for every study which shows a relation be- 
tween a particular physiological measure and estimates of read- 
ing achievement, there are other studies demonstrating that no 
relationship exists. Therefore, it is questionable that the meas- 
urement of physiological factors has any validity for diagnosing 
reading achievement. However, many reading clinicians do 
measure physical abilities to determine whether some physical 
disability could be impeding reading development. 

As was the case with research on psychological variables, a 
complete review of the research on problems and procedures 
for measuring physiological variables is not possible in this mon- 
ograph. The short review included here only attempts to draw 
attention to the problems involved in measuring those factors 
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which appear most closely related to reading measurement and 
which hold the most interest for the practitioner. 

Studies dealing with the validity of procedures for measuring 
visual acuity have been inconclusive due to the lack of consen- 
sus as to what constitutes the minimum amount of vision neces- 
sary for reading. Research in visual acuity has also suffered 
from contradictory findings regarding the relation between poor 
reading and poor vision. For instance, Kelley (1954) found no 
relationship between the Massachusetts Vision Test and £ny of 
several school achievement measures, including reading 
achievement. Kelley’s population consisted of 553 children in 
grades one to six. Edson, Bond, and Cook (1953) also found 
no relationship between poor reading and poor vision. Their 
study, which utilized 188 fourth-grade children, involved com- 
paring the relation of each of ten measures of silent reading 
skills with each of thirteen tests of visual characteristics. The 
vision tests included the American Optical Company E Chart, 
the Eames Eye Tests, the Keystone Ophthalmic Telebinocular, 
and slides from the Betts Visual Sensation and Perception Tests. 
Robinson (1951), confirming these findings, discovered no dif- 
ferences in monocular or binocular reading for randomly se- 
lected intermediate-grade pupils. Still more confirmation for 
lack of a solid relationship between visual acuity and poor read- 
ing is given in Deady’s (1952) review of 17 studies in which 
reading disability was related to various visual anomalies. 
Deady concluded that myopia was not related to reading disabil- 
ity — Eames (1948), for example, had reported only four per 
cent of 1,000 poor readers were myopic; hypermetropia, how- 
ever, was found in 43 per cent of the cases. Similar findings 
were also reported by other researchers (Farris, 1936; Fen- 
drick, 1935). Astigmatism was found not to be closely related 
to poor reading, but among the few cases of poor readers who 
had astigmatic conditions the problems were quite severe 
(Eames, 1948). Other vision problems, such as binocular in- 
coordination, strabismus, and aniseikonia were reviewed and 
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their relation to reading disability were discussed by Deady. 
Deady also suggested that the Eames Eye Test or the Massa- 
chusetts Vision Test is a much better screening device for vision 
than the Snellen Chart. When visual anomalies are noted, 
Deady recommended that an examination by an ophthalmolo- 
gist or optometrist should be conducted in cooperation with a 
trained reading clinician. 

Eames (1955), in the conclusion to his study on the influ- 
ence of hypermetropia and myopia on reading achievement, sug- 
gested that reading disability cases should have complete eye 
examinations by a vision specialist. Where this is not possi- 
ble Eames advised that any one of the following tests could be 
used: the Eames Eye Test, Keystone Visual Survey, or the Mas- 
sachusetts Vision Test. Checklists based on classroom observa- 
tions are sometimes used as a screening procedure for vision 
problems. Knox (1953) found that an observation checklist of 
symptoms of poor vision did not agree with the results of a bat- 
tery of vision screening tests. Knox used a checklist in observ- 
mg each of 126 third graders on three different occasions during 
the school day. It was concluded that “the number of different 
symptoms exhibited by a pupil in third grade is not a good crite- 

f ° r referral t0 a reactionist” (Knox, 1953, p. 100). Smith 
( 955) compared stereoscopic instruments with clinical obser- 
vaoons and concluded that stereoscopic tests are not sufficient 
for determining the status and functional performance of the 
oculomotor apparatus of students with reading disabilities. 
Robinson and Huelsman (1953) developed a visual screening 
battery for use with poor readers by comparing a list of visual 
nomahes to what existing vision tests identified and by devel- 
oping new tests when existing tests were not available 

Visual and auditory perception are dependent on both physi- 

cr Lin f PSy ,° 81Cal faCt ° rS - Perce P tion Evolves acuity dis- 

a ion, and memory or organizational functions. The 

research ^ 114 1 perCe P tion has receiv ed a great deal of 

va iditv fl Wlthm the Past decade ’ but the reliability and 
validity of the measurement devices are still subject to much 
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criticism. For example, while McAninch (1966) suggested that 
the measurement of visual perception performance is necessary 
to accurately diagnose reading disability cases, she seriously 
questioned whether present visual perception tests measure skills 
which are relevant to the reading process. McAninch urged fu- 
ture research to investigate which aspects of the visual percep- 
tual process are related to reading. Olson (1966) supported 
this conclusion in his study of the relationship between scores 
on the Marianne Frostig Developmental Test of Visual Percep- 
tion and reading ability. Further support was supplied by Al- 
exander and Money (1965) who found that patients with Turn- 
er’s syndrome, a cyto-genetic deficit characterized by deficits 
of form perception and of directional sense, did not exhibit 
atypical reading behavior. They concluded that if “space form 
and directional sense deficits are involved in the etiology of 
reading retardation, they must be specifically related to the lan- 
guage function and its symbolic representation rather than to 
general cognitional function” (Alexander & Money, 1965, p. 
984). They did not, however, include any suggestions for meas- 
uring this specific perceptual handicap. 

If visual and auditory perception were related to reading dis- 
ability, how might one use a test of visual disability as a diag- 
nostic tool in reading disability and which test should be used? 
Maslow, Frostig, Lefever, and Whittlesey (1964) suggested 
that the Marianne Frostig Developmental Test of Visual Per- 
ception is a valid indicator of learning disability and should 
serve as an integral part in diagnosis. The validity evidence 
they presented, however, is quite limited: references are only 
made to studies in which the correlation between the visual per- 
ception test and reading scores range from .4 to .5. Coleman 
(1953) used the non-verbal section of the Otis Quick-Scoring 
Mental Ability Test as a measure of visual perception and con- 
cluded that a majority of the subjects who were retarded in 
reading also showed slow development in perceptual differentia- 
tion. Several new tests for measuring visual perception have 
also been examined in the research. Of particular interest are 
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Weiner, Wepman, and Morency’s (1965) study of the Chicago 
Test of Visual Discrimination and Edwards’ (1960) study of 
three different methods of establishing thresholds for tachisto- 
scopically presented words. 

Attempts to measure auditory perception have met with con- 
flicting results. For instance, Wheeler and Wheeler (1954) ad- 
ministered the musical tone pitch test from the Seashore Meas- 
ures of Musical Talents, a test designed by the authors to 
measure auditory discrimination in oral language, and the vo- 
cabulary and reading comprehension sub-tests of the Metropoli- 
tan Achievement Tests to 629 fourth, fifth, and sixth graders. 
They concluded that while there is some correlation between the 
musical tone test and their oral language discrimination test with 
the reading sub-tests, the relationship is probably too low to be 
of educational value. Wepman (1960), in another study of this 
sort, administered the Wepman Test of Auditory Discrimination 
and a group reading test to 156 first- and second-grade children. 
Wepman found that the reading grade test score differences of 
the first graders scoring at high, average, and low levels on the 
auditory discrimination test were significantly different. The 
second graders who scored high on the discrimination test did 
not, however, score significantly higher on the reading test than 
did the low scorers. Clark and Richards (1966) studied the 
Wepman test and found that it indicated a significant deficiency 
in auditory discrimination for disadvantaged pre-school chil- 
dren. They recommended it as a diagnostic tool with similar 
populations. 

Eye movements have also been related to reading achieve- 
ment. Several attempts to measure eye movements were re- 
viewed by Tinker (1958). Those procedures discussed by Tin- 
ker included electrical devices, which seem to be superior for 
measurements which are needed over longer periods, and pho- 
tographic procedures, which involve using corneal reflections 
and recording the movement of the edge of the iris. A study 
relating eye movements to reading achievement was initiated by 
Taylor, Frackenpohl, and Pettee (1960). In order to develop 
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grade norms for various kinds of eye movements, they adminis- 
tered the ophthalmograph to 12,143 subjects in the first grade 
through college levels. Differences in eye movements were 
found at various grade levels. The average span or intake unit 
at the first-grade level was .45 of a word; by the college level, 
this average increased to only 1.11 words. In the entire study, 
not one subject was found to have an eye-span intake reaching 
three words. ^ 

Measures of lateral dominance have been found to be invalid 
as diagnostic tools in assessing reading disability. Balow and 
Balow (1964) and Balow (1963) found the correlation be- 
tween the Harris Tests of Lateral Dominance and various read- 
ing tests — the Gates Advanced Reading Tests and the Gates 
Primary Reading Test— for first and second graders was quite 
low. Both studies demonstrated that having the dominant hand 
and eye on the same side of the body, on the opposite side of 
the body, or having mixed hand dominance has no significant 
effect on reading achievement. Similar results have been re- 
ported by others (Belmont & Birch, 1965; Capabianco, 1966). 

Various other physical factors, such as blood cell and hemo- 
globin changes, muscle tension, and kinesthetic recognition, 
have been measured and related to reading achievement. 
Eames (1953) studied the blood cells of thirty reading failures 
and concluded that while blood cell and hemoglobin changes 
would not differentiate good from poor readers, they do merit 
attention in the individual case study since the blood count vari- 
ations may actually be one of the many causes contributing to 
reading failure. Wilhelm (1966) attempted to measure muscle 
tension and concluded that measurement of this kind signifi- 
cantly distinguished good readers from poor readers. Hughes, 
Leander, and Ketchum (1945) discovered abnormal electroen- 
cephalographic measurements in 75 per cent of reading disabil- 
ity cases. French (1953) devised a test of kinesthetic recogni- 
tion and found that while it did discriminate between groups of 
good and poor readers, it was too unreliable for individual use. 

From this brief overview of the research, it appears that, 
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A note to the practitioner 

Based on the review of research presented in this chapter on 
the measurement of variables related to reading, the practi- 
tioner may well be left in a considerable confusion as to the im- 
plications which this has for the teaching of reading and the 
planning of instruction. Therefore, it appears worthwhile to re- 
view the usefulness of measures of intelligence, psychological 
variables, and physiological variables in terms of how a teacher 
can use these measures in the classroom profitably. 
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with the possible exception of visual acuity, physiological meas- 
ures are neither valid nor dependable tools for diagnosing read- 
ing ability. While the measurement of eye movements has been 
quite reliable, validity studies have clearly demonstrated that it 
is a result rather than a cause of reading performance. 
Furthermore, attempts to alter eye movements have not resulted 
in improved reading achievement. The major research needed 
in measuring physiological factors lies in the area of validity 
studies that are based on procedures other than correlation 
techniques. In many of the studies conducted to date, cor- 
relations of .4 or .6 have been reported between various factors 
and reading ability. This has led researchers to conclude that 
while the correlations are not too high, it does appear that a 
particular factor is important to measure because it is related in 
some way to reading achievement. Such a conclusion is unwar- 
ranted — the importance or usefulness of these measurements 
has certainly not been substantiated by research. 

For the reading practitioner, this means that, with the possi- 
ble exception of visual acuity, the measurement of physical fac- 
tors is not relevant to the diagnosis of leading achievement. 
The validity of the measurement of these factors for assessing 
reading is so limited that the reading teacher will certainly find 
that the diagnosis of reading achievement would be markedly 
improved by emphasizing the valid and reliable measurement of 
reading behaviors. 






Measurement of reading-related variables 



201 



In applying measures of intelligence to the diagnosis of read- 
ing achievement, the practitioner should: 

1] Use a variety of procedures as well as a variety of 
tests to estimate reading capacity — language develop- 
ment tests, intelligence tests (both verbal and perform- 
ance), and measures of experiential background. 

2] Recognize the effect reading achievement has on most 
reading capacity estimates and try to compensate for 
these effects through the use of additional measures 
which do not confound reading skills and reading ca- 
pacity. 

3] Remember that the relation between intelligence test 
performance and reading test performance becomes 
stronger at the older chronological ages and, therefore, 
that the use of reading capacity minus reading achieve- 
ment scores for selecting remedial readers becomes a 
less valid procedure as chronological age increases. 

In using measures of psychological variables other than 
those of intelligence, the practitioner should bear in mind that 
most of these variables have only a very limited relation to read- 
ing achievement. Research has hot yet begun to explore the 
nature of this relationship and the contribution it can make to 
diagnosing reading achievement. Therefore, it seems reasona- 
ble to conclude that at the present time the practitioner should 
view this area as one which is in need of further research and 
which offers little in terms of practical application at the class- 
room level. 

The relation of physiological factors such as visual acuity, 
auditory acuity, and general physical status to reading is -ery 
limited. However^ it is important for the teacher to be aware 
that these may impede reading skills and that physiological 
measures can be useful only in this sense. Otherwise, the meas- 
urement of other physiological factors such as laterality, does 
not, at the present time, appear to be of any value to the prac- 
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titioner and he should not attempt to use such measurement un- 
til research is able to show how these measures can be applied 
to the planning of a reading program. 
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Summary: test usage and research needs 



This monograph has focused on the contribution which vari- 
ous procedures for measurement can make to the teaching of 
reading. Much of the research reviewed in the monograph has 
cast considerable doubt on the validity and reliability of all test- 
ing instruments in general and group standardized tests in par- 
ticular. This is not to say that measuring devices have no value 
in reading instruction. On the contrary, the preceding chapters 
have stressed that tests can make a valuable contribution to 
classroom practice if they are used with caution and if the test 
user is well aware of their limitations: the test consumer should 
know why he wants to use tests and for what he is testing. In 
addition, the objectives of the tests used and the objectives of 
the instructional program should be closely related. Because 
testing instruments have limited reliability and validity, a variety 
of devices should be used. 

This monograph does not provide detailed procedures for 
using tests in the school program; it does provide guidelines. 
Some major uses of testing in the reading program center 
around the following four points: 1] determining students’ in- 
structional reading levels, 2] diagnosing reading skills, 3] esti- 
mating reading growth, and 4] evaluating the instructional pro- 
gram. The latter point was not treated extensively in this mon- 
ograph. These test uses are discussed individually in the fol- 
lowing paragraphs in terms of the kinds of devices which can be 
most effectively used in each of the areas. 
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Determining instructonal levels Informal reading inventories 
based on classroom instructional reading materials provide 
the most valid estimate for determining functional reading 
levels, both in general reading and in reading in the con- 
tent areas. Standardized group tests are not accurate in 
determining students’ instructional reading levels as they 
tend to overestimate them by significant amounts (al- 
though this is dependent on the particular test used). 
Standardized reading tests are valid for ranking students, 
but they should not be used for determining instructional 
levels. 

Diagnosing reading skills Reading diagnosis should be an 
integral, continuous part of reading instruction. In diag- 
nosis, it is particularly important to select a test which de- 
fines reading in the same manner as the reading program 
does. Sub-tests of standardized reading tests have consist- 
ently been shown to have limited validity as measures of 
reading sub-skills when used on a group basis; their valid- 
ity when used on an individual basis has not been studied 
extensively. Therefore, sub-tests of standardized reading 
tests should be used cautiously for diagnostic purposes. 
Informal means of assessment such as teachers’ observa- 
tions and skills check lists when combined with standard- 
ized tests tend to be more valid for diagnosis. 
Physiological and psychological tests have limited value in 
assessing reading skills and determining instructional pro- 
grams. 

Estimating growth Growth in reading ability should be 
measured as it relates to specific goals of the reading pro- 
gram. In fact, it is impossible to evaluate growth unless 
these goals are specified. In addition, the tests used should 
measure skills in the same manner in which they were 
taught in the instructional program. Many times it is 
more efficient not to test all students if the estimate of 
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growth desired is for the entire class. Instead, students 
can be selected at random for testing or test items can be 
assigned at random to students. For many reasons deline- 
ated in Chapter 4, it is not good practice to evaluate 
growth on tb* basis of standardized test norms. If stand- 
ardized test scores are used as the basis for measuring 
growth in a remedial program and if any of the pre-test 
scores are extremely low, a correction for regression 
should be used. Also, any assessment of change should 
take into account the error of measurement of the instru- 
ments used. 

Evaluating the instructional program Program evaluation 
should be continuous and should provide feedback for im- 
proving the instructional program. Informal observations 
are most useful in evaluating the program; however, the 
behaviors that are to be observed should be clearly spe- 
cified. 

If the guidelines presented above seem sparse, it is because 
the state of knowledge in the field of testing and evaluation in 
reading is so limited. In fact, present measurement practices 
and instruments often are not as helpful as they could be in 
teaching reading. This is not the fault of either test consumers 
or test producers. Test users have been naive about the value 
of tests in the classroom. This has led to gross misuse of tests 
and situations where important stated objectives of reading pro- 
grams have been consistently unevaluated. Compounding the 
problem is the fact that tests have been produced which do not 
meet the needs of the instructional program. More often than 
not tests fail to provide teachers with information about stu- 
dents’ instructional reading levels, basic reading skills develop- 
ment, and attitudes toward reading. Most reputable test pub- 
lishers do not claim that tests can supply such knowledge, but 
they do imply that they do provide diagnostic information by in- 
cluding reading sub-test profiles and grade level norms. 
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Standardized reading tests are the instruments teachers most 
often rely on to determine students’ instructional needs. 
However, these tests are not very helpful to the teaching of 
reading for several basic reasons. First of all, the sub-tests of 
these tests are of questionable validity for determining students 
specific reading strengths and weaknesses. The research exam- 
ining the sub-tests of present standardized reading tests for diag- 
nosing individuals has been almost wholly neglected. This va- 
lidity problem is certainly the result of the focus of reading tests, 
reading teachers, and reading research on the product of read- 
ing rather than on the reading process. Most tests are designed 
to reveal what a student can do and not how he does it. Only 
recently have reading researchers begun to focus on the reading 
process. 

One of the major shortcomings in the classroom measure- 
ment and evaluation of reading ability stems from incomplete 
knowledge as to the nature of the reading process and the fac- 
tors that influence it. Tests are often developed, interpreted, 
and administered, as if reading ability was a skill which had no 
relation to the individual’s experiential background, environ- 
mental factors both within school and society, the classroom set- 
ting, instructional materials, etc. As was pointed out in earlier 
chapters, if the subject content of the reading material, the pur- 
pose for reading, the reading conditions, the difficulty of the 
reading material, or any factors related to the reading situation 
were altered, reading performance would certainly be changed. 

Much research remains to be done before tests can begin to 
make their full contribution to reading instruction. The major 
obstacle in testing and measurement today is the lack of a clear 
understanding of what the reading process entails. Until a the- 
oretical construct of reading is developed and substantiated, the 
value of testing devices will remain extremely limited. 
However, once reading is defined, the avenues for test develop- 
ment will broaden: it will be possible to develop criterion tests 
geared to assess how well an individual reads on the basis of 
what reading is rather than on the basis of how others perform 
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(which current tests do). The development of such criteria 
should itself be a major step forward in terms of classroom 
practice; for once the goal for reading instruction is established, 
one can begin to determine what skills an individual should pos- 
sess in order to read and how instruction can be organized to 
teach these skills. Certainly, a clearer conception of the read- 
ing process will facilitate the development of more valid and re- 
liable sub-tests of reading. Another area which has suffered 
because of the lack of knowledge about the reading process has 
been procedures for determining reading capacity. One of the 
pressing needs in estimating capacity is the development of 
measures of reading potential which are not dependent on ac- 
quired reading. Perhaps once reading is defined, the capacity 
to read can be isolated from reading achievement in the tests 
themselves. 

The need for a sound definition of reading cannot be over- 
emphasized: the validity of reading measures depends upon the 
validity of their theoretical basis. Once a sound foundation has 
been established and tests developed, then research can proceed 
in the direction of determining how best to use these tests in the 
reading program. Certainly, the problem of the validity of 
using equivalent forms of tests should be probed, as should 
methods for measuring growth over short time periods. Also, 
much work would need to be done in attempting to combine 
various procedures to measure growth. Perhaps another im- 
portant avenue for further research would be the development 
of tests which measure qualitative as well as quantitative levels 
of response. Current tests measure only the quantitative levels, 
even though they contain items which could be used to assess 
depth and variety of understanding on items such as vocabulary 
and comprehension. This is because the test scores are based 
only on a total number of correct answers and supply no indica- 
tion of how correct or incorrect the responses were. 

An important function of tests, as mentioned earlier in this 
chapter, is in the evaluation of the school program. The devel- 
opment of a clear definition of reading would certainly open re- 
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search up in this area. Variables such as instructional mate- 
rials, curriculum, teacher effectiveness, and teaching procedures 
could be investigated in the context of how each contributes to 
developing reading skills. The interaction of these variables 
should also be studied to determine the most effective combina- 
tion for promoting reading development. 

Answers to these research questions should provide a basis 
for the improvement of the measurement and evaluation of 
reading. However, there are a number of new (and several re- 
discovered) procedures which are being tried out and studied. 
These procedures are closely related to the research questions 
posed earlier in this chapter and while their use is generally lim- 
ited, the very fact that they have been developed is encouraging. 

One of the newer approaches to measuring reading behavior 
which seems to hold some promise for improving the analysis of 
reading ability is the greater emphasis on defining purposes for 
reading. This should not only aid in assessing reading ability, 
but also, it might help in improving the teaching of reading. 
Ample evidence is now available that students do not alter their 
reading patterns to achieve particular purposes unless they have 
had guided practice in doing so. If teachers discover that stu- 
dents can improve their reading test scores and, more impor- 
tantly, their reading in content subjects if they establish specific 
purposes for reading, tests which use this procedure will have 
provided a springboard to improved instruction. 

A second procedure which is being developed on some read- 
ing vocabulary and comprehension tests is the use of qualitative 
levels of responses for multiple-choice questions. The attempts 
thus far have been in the direction of developing a more diag- 
nostic utilization of student responses. The usual patterns of 
one correct response and four incorrect responses may be re- 
placed by levels of correct responses. 

A third development is a tendency to measure reading skills 
as they are actually used in classroom situations. The develop- 
ment of reading vocabulary tests in which the words to be de- 
fined are imbedded in the reading text is not an innovation. 
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However, more test producers are using this procedure because 
it provides a more realistic appraisal of reading ability than do 
tests in which vocabulary items are presented in isolation and 
examinees are to select the best synonym for a group of alterna- 
tives. 

The cloze procedure, which was discussed in earlier sections 
of this monograph, also seems to be a testing method which 
more closely resembles actual reading behavior. However, 
cloze techniques do not seem to allow the test developer to ex- 
amine the inferential reading-thinking abilities of examinees as 
well as multiple-choice techniques. Additional research is needed 
tojearn more about the construct validity of this measurement 
approach. 

The technical procedures in developing reading tests also are 
improving. Test publishers, and more importantly, test consum- 
ers are becoming increasingly aware of the American Psycholog- 
ical Association’s Standards for Educational and Psychological 
Tests and Manuals (available from the American Psychological 
Association, 1200 Seventeenth Street, N.W., Washington, D.C. 
20036). Many test publishers have improved their tests and 
manuals to meet these specifications. Perhaps the most notable 
technical improvement in the development of standardized read- 
ing tests has been improved sampling procedures for securing 
representative national norms. Not only are these norm popu- 
lations selected with better care, but the description of the norm 
groups is more complete and, therefore, more useful to the test 
user. Some of the needed improvements in the development of 
standardized reading tests are being withheld because of out- 
dated word lists, questionable readability formulas, and lack of 
information about the basic skills of reading. 

Future research and development in testing will, in time, 
provide answers to some of the more fundamental questions of 
what reading is. In the meantime, research will have to con- 
centrate on how to use current testing instruments more effec- 
tively. Hopefully, this monograph is a step in that direction. 
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GLOSSARY 



Achievement test — a measure of the degree to which a person has at- 
tained objectives of instruction or education. 



Age equivalent — the chronological age for which a given score is the 
real or estimated average score. 

Cloze procedure — the method for determining a student’s reading com- 
prehension on a particular selection. The procedure involves eliminating 
every fifth word (or some other number may be chosen) and asking the 
examinee to supply the missing word. 



4 

,r 



Coefficient of correlation — a measure of the degree of relationship be- 
tween two sets of measures either for the same group of individuals or 
for paired individuals (e.g., twins). The Pearson product-moment coeffi- 
cient is r; the Spearman rank coefficient is rho (p). 



Coefficient of equivalence — the type of reliability coefficient obtained 
when parallel or equivalent forms of the same test are administered to 
the same individuals. 



Concurrent validity — the degree to which an individual’s test perform- 
ance predicts performance on some criterion external to that test. 



Construct validity — the degree to which an individual’s performance on 
a particular test is predictive of the degree to which the individual pos- 
sesses some trait or quality. 



Content validity — the degree to which the results of a particular test 
represent an individual’s performance on a given universal content of 
which the test is a sample. 



Convergent validity — the degree to which a particular test shows agree- 
ment with the measurement of variables that are the same or quite 
similar. 



Criterion — a standard by which a test may be judged or evaluated; a set 
of scores, measures, ratings, products, etc., that a test is designed to pre- 
dict or correlate with as a test of its validity. A set of concepts or ideas 
used in judging the content of a test, in estimating its content or logical 
validity. 



This glossary has been taken, with some revision and some additions, from H. 
Remmers, N. L. Gage, and J. F. Rummel, A Practical Introduction to Measurement 
and Evaluation (N.Y.: Harper & Row, 19 ); R. T. Lennon, A Glossary of 100 

Measurement Terms (N.Y.: Harcourt, Brace, & World, 19 ); and H. B. English 

and A. C. English, A Comprehensive Dictionary of Psychological and Psycho- 
analytical Terms (N.Y.: Longmans, Green, 1958). This material is presented here 
by permission of David McKay Company, Inc.; Harcourt, Brace and World; and 
Harper and Row. 
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Criterion test — a test whose primary purpose is to determine the extent 
towhich individuals in a group have learned "n.asteredag.ven un.tof 
instruction This type of test is intended noi to diuerentiate wiaeiy 
among individuals, but to determine whether or not a group of students 
has°achieved a certain level of proficiency. It is used primarily to deter- 
mine whethe^ or not the group is ready to advance to another unit of 

instruction. 



Culture-free test — a test devised to rule out the effects of an individual s 
previous environment on his score. No such test is actually possible A 
“culture-free” test does not rule out such effects but merely makes them 
equivalent for the persons to be compared. 

Diagnostic test— a test used to diagnose or to show an individual's 
strengths and weaknesses in a specific area of study. It yields measures 
of the components, or subparts, of some larger body of information or 

skill 



Discriminate validity — the degree to which a particular test does not 
overlap with measurement of variables from which it should differ. 



Equivalent form — any of 
parallel in content and in 
average scores, measures 
given group. 



two or more forms of a test that are closely 
difficulty of items, and that yield very similar 
of variability, and reliability estimates for a 



Error of measurement — See Standard error. 



Face validity — the apparent validity of a test 
propriate for the individual being measured, 
is made up of items that seem related to the 
Validity. 



that seems fair to and ap- 
The extent to which a test 
variable being tested. See 



Factor— a hypothetical trait derived by factor analysis. 

Factor analysis— a method of computing for determining factors from 
the intercorrelations among a set of variables, usually tests. 



Grade equivalent— the grade level for which a given score is the real or 
estimated average. 



Grade norm — the average score obtained by pupils of a given grade 
placement. Also referred to as the modal grade age. 

Group test— a test that can be administered to a number of individuals 
at the same time by one examiner. 




Glossary 



221 



Hawthorne effect — any effect which causes an experimental group to 
perform differently from expectant levels as a result of their knowledge 
of their inclusion in the experiment. 



Individual test — a test that can be administered to only one individual at 
a time. 



Instructional level — the level at which it is expected the student will 
make the maximal amount of growth. 



Intelligence quotient (I.Q.) — the ratio obtained by dividing mental age by 
chronological age, i.e., (MA -r CA) 100. A measure of brightness that 
takes into consideration both score on an intelligence test and age. 



Inventory — an instrument used for cataloguing or listing all or a sample 
of behaviors, interests, attitudes, etc., regarded as useful or relevant for 
a given purpose. It is not a “test” or a measure in the usual sense and 
has no right or wrong answers. 



Local norms — norms that have been made by collecting data in a certain 
school or school system and using them, instead of national or regional 
norms, to evaluate student performance. 



Mental age (M.A.) — the age for which a given score on an intelligence or 
scholastic ability test is average. It is the average age of individuals mak- 
ing the average score on the test. 



National norm — a norm based on nation-wide sampling. 



Nonverbal test — a paper-and-pencil test, usually used with children in 
the primary grades, in which the test items are symbols, figures, and pic- 
tures rather than words; instructions are given orally. 



Norming population — the population which was utilized to establish av 
erage performance for various age or grade groups. 



Norms — values that describe the performance of various groups on a test 
or inventory. Norms are only descriptive of existing types of perform- 
ance and are not to be regarded as standards or as desirable levels of at- 
tainment. 



Parallel tests — see Equivalent form. 



Percentile rank — the percentage of scores in a distribution equal to or 
lower than the score corresponding to the given rank. 
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Physiological measures — any set of procedures or instruments which are 
used to assess physical development, ability, or capacity. 

Power test — a test intended to measure level of performance rather than 
speed of response; hence, one in which there is either no time limit or a 



Practice effect — the influence of previous experience with a test on the 
later administration on the. same or a similar test. The term is usually 
employed (when the practice effect is not itself what is at issue, but is 



or correlation; symbolized by r. 

Profile — a graphic presentation of the results of an individual’s perform- 
ance on a group of tests. 

Psycholinguistic — a term applied to the analysis of language based on 
an understanding of both cognitive development and language structure. 

Psychological measures — any set of procedures or instruments which are 
used to assess mental ability and personality structure. 

r — see Coefficient of correlation. 



Rank-order correlation (rho, p) — a method of obtaining a correlation 
coefficient by assigning ranks to each score of all individuals, and deter- 
mining the relationship between them. Also called rank-difference coef- 
ficient of correlation. 



Raw score — the original, untreated result obtained from a test or other 
measuring instrument. Usually the number of right answers, or points 
on a point scale. 

Readiness test — a test that measures the extent to which an individual 
has achieved a degree of maturity or acquired certain skills or informa- 
tion needed for beginning some new learning activity. Most frequently 
used with preschool children to determine their readiness for entering 
school. 



very geneipus one. 



lely used correlation coefficient 
>on product-moment coefficient 




ie degree to which the results of a particular test 
P- predictive of the examinee’s future performance. 
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Reading age— an age-equivalent score assigned to the average 
reading test for individuals at a given age. 



score on a 



Reliability — the extent to which a test is consistent 
ing whatever it does measure. 



with itself in measur- 



Reliability coefficient— the coefficient of correlation obtained between 
two forms of a test (alternate-form or parallel-form reliability); between 
scores on repeated administrations of the same test (test-retest reliability) • 
between halves of a test properly corrected (split-half reliability); or by 
using the Kuder-Richardson formulas. y 



Standard deviation (SD, s, <r)— a measure of the variability or dispersion 
c °l a set ° f scores T he . ^ore the scores cluster about the mean, the 
smaller the standard deviation. In a normal distribution, approximately 
68 per cent of the scores fall within the range of one SD above and 
below the mean; approximately 95 per cent fall within a range of two 
SD s, and practically all the scores fall within a range of three SD’s. 



Standard error (SE) — an estimate of the magnitude of the “error of 
measurement in a score, i.e., the amount by which an obtained score 

, fr0n i a u h yp° theticall y true score. The standard error is an 
amount such that in approximately two-thirds of the cases the obtained 
score does not differ more than one SE from the true score. 



Standard score, z score— a score in which each individual’s score is ex- 
pressed in terms of the number of standard deviation units of the score 
irom the mean. 



Standardized test, standard test— a test that has been given to various 
K’fenefflSed nder S,andardized conditi °" s a " d which norms 



sun-tests a set of sub-groups ui items wmcn are devel 
posedly measure specific sub areas of a more general ability 



uw *uupvu 



Survey test— a test that measures general achievement in a given subject 

!! ge ?. erally con ?erned with breadth of coverage than 
with specific details or discovery of causal factors. It is most frequently 
used for screening large groups of persons. 4 y 



^ S Tf e r a standard s c or e with a mean of fifty and a standard devia- 
tion ot ten; usually used to convert raw scores on two or more tests into 
comparable scores for ease in interpretation. 
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True score — the score that would be obtained if we had a perfectly reli- | 

able measuring instrument. If it were possible to measure an individual s 

over and over again with the same test, without any changes in the indi- 5 

vidual, the average of all his test scores would be an estimate of his true 
score. True scores are never obtained, but rather are considered hypo- 
thetical values. 



Validity — the extent to which a test measures what it is supposed to 
measure. Validity is defined on the basis of different purposes; different 
kinds of evidence are used in defining types of validity. The most com- 
mon types of validity are: content validity, which describes how well the 
content of the test samples the class of situations or subject-matter about 
which conclusions are to be drawn; concurrent validity, which describes 
how well test scores correspond to measures of concurrent criterion per- 
formance or status; predictive validity, which indicates how well predic- 
tions made from the test are confirmed by evidence gathered at some 
later time; and construct validity, which indicates the degree to which 
certain explanatory constructs or conceptualizations account for per- 
formance on the test. 



Verbal test — a test in which results depend to some extent on the use 
and comprehension of words, as in most paper-and-pencil tests. 



z score — see Standard score. 
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Guide to tests and measuring instruments in reading - 



Roger Farr 

Indiana University 

Edward G. Summers 

University of British Columbia 



The purpose of the Guide to Tests and Measuring Instru- 
ments in Reading is to provide researchers and practitioners 
with a reference tool for quickly identifying those reading tests 
which meet their particular needs. Test consumers often want 
to locate a number of tests which could be used at a particular 
grade level or those which contain certain sub-tests. This 
Guide should aid in that type of search. For example, if a test 
consumer wanted to identify a number of reading tests that 
could be used with fifth-grade children, he could quickly look 
down the column labelled Grade and find several possible 
choices. After identifying the alternative choices, he could then 
carefully examine copies of the actual test. Test publishers are 
often quite willing to supply sample tests to prospective test 
users. For the readers’ convenience, a list of publishers’ ad- 
dresses follows the Guide. 

Following the Guide are two indexes which will supply the 
test user with the means for conducting a more critical study of 
the tests. The first index is to Buros’ Reading Tests and Re- 
views (Highland Park, N.J.; Gryphon Press, 1968) and Buros’ 
Mental Measurement Yearbooks. By consulting these reviews, 
the test consumer can learn more about the strengths, weak- 
nesses, and uses of any of the tests he is interested in. A 




second index provides references to those documents in the six 
ERIC/CRIER basic references which reported use of the test 
in reading research. These basic references include materials 
which have been reported in the published journal literature. 
These references provide valuable information to the researcher 
or test consumer interested in an indepth study of a particu- 
lar test. 

The Guide contains only those tests which are currently pub- 
lished in the United States. Therefore, those reading tests 
which are either out of print or have been published abroad are 
not included. The tests are organized alphabetically by test 
name. In most cases, the information supplied was taken di- 
rectly from the test or the test manual provided by the pub- 
lisher. Descriptive information for each test includes the data 
listed below. 

1] Test title The title listed on the front cover of the test 
booklet. The date of first publication and most recent re- 
vision are listed in parentheses after the test title. If an 
asterisk appears following the entry, it indicates that the 
test is an individual test; the absence of any notation indi- 
cates that the test is a group test. 

2] Grade or age level The suggested grade level for using 
the test is listed as indicated by the publisher. In several 
instances, the test publisher supplied only age levels. 
These have been converted to grade level equivalencies. 
A dagger (f) has been placed after those which were 
originally given in age levels. 

3] Sub-tests The names of the sub- tests are as indicated 
in the test booklet. 

4] Number of forms The number of forms is listed so 
the potential test user will know if alternative forms are 
available for pre- and post-testing. 



5] Time in minutes The approximate time needed for 
administering the tests is based on information provided by 
the publisher. 

6] Authors The names are listed as they appear on the 
front of the test booklet. 

7] Publisher The publishing company is listed as indi- 
cated on the front of the test booklet. 
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Index to Reading Tests and Reviews and 
Mental Measurement Yearbooks 



This index provides a quick reference to the critiques of 
reading tests appearing in Buros’ Reading Tests and Reviews 
(Highland Park, New Jersey: Gryphon Press, 1968) and to 
Buros’ Mental Measurement Yearbooks (Highland Park, New 
Jersey: Gryphon Press, 1938, 1940, 1949, 1953, 1959, 1965), 
These excellent test reviews should be studied before a test 
consumer makes a final test selection. 

Within the index, tests are arranged alphabetically by test 
name. The first column after the test name gives the volume 
number of the Mental Measurements Yearbook (MMY) which 
includes the most recent review of each test. Following this 
MMY number is the test’s number in that yearbook. The sec- 
ond column supplies the page number on which there is a de- 
scriptive listing and/or a critical review of the test in Reading 
Tests and Reviews. Only those tests in the Guide which have 
been reviewed or described in Buros are included in the index. 
For those tests which have been described, but not reviewed in 
Buros, the first column is left blank. $ 

For example, the column entries for the American School 
Achievement Tests are 6:783 and 290. The first number indi- 
cates that the tests are listed in Buros’ Sixth Mental Measure- 
ment Yearbook and are the 783rd test listing in that book. The 
second number indicates that the tests are also reviewed on 
page 290 of Reading Tests and Reviews. 

It should be noted that the reviews in Reading Tests and Re- 
views are the same ones which have appeared in the MMY’s. 
The reason why both references are listed here is that a test 
consumer may have access to only one of these references. 
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12 


Davis Reading Test 

Series 1 
Series II 


6:786 

6:786 


291 

291 



Delaware County Silent Reading Test 
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Volume and test 



number in Page in 

Mental Measurement Reading Tests 



Test 


Yearbooks 


and Reviews 


Developmental Reading Tests 


Primer Reading 


6:787 


293 


Lower Primary Reading 


6:787 


293 


Upper Primary Reading 


6:787 


293 


Intermediate Reading 


6:787 


293 


Developmental Reading Tests: 


6:832 


355 


Silent Reading Diagnostic Tests 


Diagnostic Examination of Silent Reading 


3:480 


89 


Abilities 


Diagnostic Reading Scales 


6:821 


339 


Diagnostic Reading Test 


Lower Level 


6:823 


342 


Upper Level 


6:823 


342 


Diagnostic Reading Test, 
Pupil Progress Series 


Primary Level 1 


6:822 


340 


Primary Level II 


6:822 


340 


Elementary Level 


6:822 


340 


Advanced Level 


6:822 


340 


Dolch Basic Sight Word Test 




12 


Doren Diagnostic Reading Test of 


5:659 


246 


Word Recognition Skills 


Durrell Analysis of Reading Difficulty 


5:660 


248 



Early Detection Inventory 15 
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and Reviews 
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Emporia Reading Tests 
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Primary Reading Test 
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Elementary Reading Test 
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Intermediate Reading Test 
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Junior High School Reading Test 
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Emporia Silent Reading Test 


2:1534 


46 
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Evaluation Aptitude Test 


5:691 


275 
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Every Pupil Achievement Test 
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Primary Reading (Grade 1) 


6:803 


320 




Primary Reading (Grades 2-3) 


6:803 


320 
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Flash-X Sight Vocabulary Test 


6:841 


367 
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Functional Readiness Questionnaire for 


6:835 


360 






School and College Students 
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Gates-MacGinitie Reading Tests 
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Primary A 


6:792 


301 
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Primary B 


6:792 


301 
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Primary C 


6:792 


301 
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Primary Cs 


6:792 


301 
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Survey D 


6:792 


301 
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Survey E 


6:792 


301 




I j: • ■ | 


Gates-MacGinitie Reading Tests— 




15 
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Readiness Skills 
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Gates-McKillop Reading Diagnostic Tests 


6:824 


345 
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Gilliland Learning Potential Examination 
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Gilmore Oral Reading Test 


5:671 
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Gray Oral Reading Test 
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'/olume and test 





number in 


Page in 


Mental Measurement 


Reading Tests 


Test 


Yearbooks 


and Reviews 


Group Diagnostic Reading Aptitude and 
Achievement Tests— Intermediate Form 


6:825 


348 


Harrison-Stroud Reading Readiness Profiles 


5:677 


265 


Individual Placement Series- 
Reading Adequacy “READ" Test 


6:805 


321 


Iowa Every-Pupil Tests of Basic Skills, 
Test A: Silent Reading Comprehension 






Elementary Battery 


4:554 


182 


Advanced Battery 


4:554 


182 


Iowa Every-Pupil Tests of Basic Skills, 
Test B 






Elementary Battery 


4:588 


210 


Advanced Battery 


4:588 


210 


Iowa Silent Reading Tests 






Elementary 


6:794 


307 


Advanced 


6:794 


307 


Iowa Tests of Educational Development 






Test 5: Ability to Interpret Reading 


6:852 


378 


Materials in the Social Studies 
Test 6: Ability to Interpret Reading 


6:853 


378 


Materials in the Natural Sciences 
Test 9: Use of Sources of Information 


6:858 


381 


Kelley-Greene Reading Comprehension Test 


5:636 


226 


Keystone Ready to Read Tests 




15 


Learning Methods Test 


6:836 


340 






Volume and test 



number in Page in 

Mental Measurement Reading Tests 



Test 


Yearbooks 


and Reviews 


Lee-Clark Reading Readiness Test 


6:846 


373 


Lee-Clark Reading Test 


Primer 


6:795 


308 


First Reader 


6:795 


308 


Lippincott Reading Readiness Test 




15 


(Including Readiness Checklist) 


Logical Reasoning 


5:694 


279 


Los Angeles Elementary Reading Test 


4:541 


171 


McCullough Word Analysis Tests 


6:826 


348 


McGrath Test of Reading Skills, 




4 


Second Edition 


McHugh-McParland Reading Readiness Test 




15 


McMenemy Measure of Reading Ability 


Primary 




4 


Intermediate 




4 


Advanced 




4 


Maintaining Reading Efficiency Tests 




4 


Maturity Level for School Entrance 


6:847 


374 


and Reading Readiness 

Metropolitan Achievement Tests: Reading 


Upper Primary Reading Test 


6:797 


311 


Elementary Reading Test 


6:797 


311 


Intermediate Reading Test 


6:797 


311 


Advanced Reading Test 


6:797 


311 





Volume and test 



number in 

Mental Measurement 

^ est Yearbooks 


Page in 
Reading Tests 
and Reviews 


Metropolitan Readiness Tests 


4:570 


194 


Minnesota Reading Examinations for 
College Students 


2:1554 


59 


Minnesota Speed of Reading Tec* for 
College Students 


2:1555 - 


" 61 


Monroe’s Standardized Silent Reading Test 


6:798 


312 


Murphy Durrell Reading Readiness Analysis 


5:679 


268 


National Achievement Tests: 
High School Reading Test 


5:634 


225 


National Achievement Tests: 
Municipal Tests: Reading Test 
(Comprehension and Speed) 


5:648 


232 


National Achievement Tests: 

Reading Comprehension Test (Speer & Smith) 


5:646 


231 


National Achievement Tests: 
Reading Comprehension Test 
(Crow, Kuhlmann, & Crow) 


5:647 


231 


Neale Analysis of Reading Ability 


6:843 


370 


Nelson-Denny Reading Test: 
Vocabulary-Comprehension-Rate 


6:800 


315 


Nelson Reading Test 


6:802 


320 


OC Diagnostic Dictionary Test 


6:861 


382 


OC Diagnostic Syllabizing Test 


6:827 


350 
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Volume and test 
number in 


Page in 




Mental Measurement 


Reading Tests 


Test 


Yearbooks 


and Reviews 


Ohio Diagnostic Reading Test 


Level 1 




11 


Level II 




11 


Peabody Library Information Test 


Elementary Level 


3:538 


148 


High School Level 


3:538 


148 


College Level 


3:538 


148 


Perceptual Forms Test 


6:848 


374 


Phonics Knowledge Survey 


6:828 


350 


Phonovisual Diagnostic Test 


6:829 


350 


Pictographic Self Rating Scale 


5:695 


280 


Pressey Diagnostic Reading Tests 




5 


Primary Academic Sentiment Scale 




16 


Primary Reading Profiles 


5:665 


252 


Primary Reading Test: 


5:642 


230 


Acorn Achievement Tests 


Public School Achievement Tests*. Reading 


6:807 


324 


Purdue Reading Test 


5:643 


230 


Purdue Reading Test for Industrial 


5:644 


230 


Supervisors: Purdue Personnel Tests 


RBH Reading Comprehension Test 


6:817 


337 


RBH Scientific Reading Test 
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Volume and test 
number in 


Page in 




Mental Measurement 


Reading Tests 


Test 


Yearbooks 


and Reviews 


Reader’s Inventory 




13 


Reader Rater with Self-Scoring Profile 


6:837 


363 


Reading: Adult Basic Education Survey, 




17 


Parts 1 and 2 
Reading Eye 


6:838 


363 


Reading Skills Diagnostic Test 




11 


Reading for Understanding Placement Test 




Junior and General Edition 




6 


Senior Edition 




6 


Reading Versatility Test 

Paper and Pencil Edition 


6:839 


365 


Basic Reading Eye Edition 


6:839 


365 


Intermediate 


6:839 


365 


Advanced 


6:839 


365 


Robinson-Hall Reading Tests 


4:575 


197 


Roswell-Chall Auditory Blending Test 


6:830 


352 


Roswell-Chall Diagnostic Reading Test 


5:667 


255 


of Word Analysis Skills 
School Readiness Behavior Tests 




16 


Used at the Gesell Institute 

School Readiness Checklist, Research Edition 


16 


School Readiness Survey 




16 
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Volume and test 

number in Page in 

Mental Measurement Reading Tests 
Yearbooks and Reviews 



Schrammel-Gray High School and 
College Reading Test 



3:500 



112 



Screening Test of Academic Readiness 



Screening Test for the Assignment of 
Remedial Treatments 



16 

16 



Screening Tests for Identifying Children 
with Specific Language Disability 



13 



Slosson Oral Reading Test (SORT) 
SRA Achievement Series 
SRA Reading Checklist 
SRA Reading Progress Test 
SRA Reading Record 



6:844 

6:808 



4:550 



373 

324 

13 

17 

177 



SRA Tests of Educational Ability 
Level I 
Level II 
Level III 



411 

411 

411 



SRA Tests of General Ability 
SRA Youth Inventory 
Standardized Oral Reading Check Tests 
Standardized Oral Reading Paragraphs 



2:1570 

2:1571 



Stanford Achievement Test: 
High School Reading Test 



411 

437 

71 

72 
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Volume and test 

number in Pag e > n 

Mental Measurement Reading Tests 



Test 


Yearbooks 


and Reviews 


Stanford Achievement Test: 
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Reading Tests 


6:813 


331 


Primary 1 


6:813 


331 


Primary 2 
Intermediate 1 
Intermediate II 

Advanced Paragraph Meaning 


6:813 

6:813 

6:813 


331 

331 

331 


Stanford Diagnostic Reading Test 




12 


Level 1 




12 


Level 2 




Steinbach Test of Reading Readiness 




16 


Study Habits Checklist 




19 


Study Habits Inventory 


3:540 


150 


Siudy Performance Test 




19 


Study Skills Counseling Evaluation 


6:865 


384 


Survey of Primary Reading Development 


6:814 


332 


Survey of Study Habits and Attitudes 


6:856 


378 


(SSHA) 






Survey of Study Habits, 


4:583 


207 


Experimental Edition 






Survey Tests of Reading 




7 


Tests of Academic Progress 




7 


Tests of General Educational Development 


5:683 




Test 2: Interpretation of Reading 
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Volume and test 



number in Page in 

Mental Measurement Reading Tests 



Test 


Yearbooks 


and Reviews 


Materials in the Social Studies 
Test 3: Interpretation of Reading 
Materials in the Natural Sciences 


5:684 


270 


Test of Individual Needs in Reading 




12 


Test on the Use of the Dictionary 


6:866 


386 


Tinker Speed of Reading Test 


5:687 


270 


Traxler High School Reading Test 


4:559 


187 


Traxler Silent Reading Test 


4:560 


187 


Tyler-Kimber Study Skills Test 


2:1580 


80 


Understanding Communication 
(Verbal Comprehension) 


6:840 


365 


Valett Developmental Survey of 
Basic Learning Abilities 




16 


Van Wagenen Reading Readiness Scales 


3:520 


134 


Watson-Glaser Critical Thinking Appraisal 


6:867 


386 


Watson Reading Readiness Test 


6:851 


377 


Wide Range Achievement Test (WRAT) 






Level 1 




391 


Level II 




391 


Williams Primary Reading Test 






Primary 1 


5:658 


246 


Primary II 


5:658 


246 


Williams Reading Test for Grades 4-9 
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Index to published research literature in reading 



This index provides a reference to research articles which 
have reported use of the tests described in the Guide to Tests 
and Measuring Instruments in Reading. The document base 
used to compile the index consists of six basic references pub- 
lished by ERIC/CRIER. The 3,500 articles cited in the basic 
references were scanned and the tests reported used in the 
research were noted. These tests are listed alphabetically in the 
index. The document numbers are grouped because the tests 
are frequently cited in research articles by name only and not by 
level. For example, a researcher may report using the Califor- 
nia Reading Test, but he may not indicate whether he used the 
primary level or the intermediate level. Following each entry 
are the numbers of those documents in the six ERIC/CRIER 
Basic References which reported use of the test in reading re- 
search. The ERIC/CRIER Basic References include the fol- 
lowing six bibliographies. 

1] Published Research Literature in Reading, 1950- 
1963 (ED 012 834, microfiche $1.50; hard copy $19.90 
from EDRS/NCR) Includes ERIC/CRIER document 
numbers 2885-4803. 

2] Published Research Literature in Reading, 1964- 
1966 (ED 013 969, microfiche $0.75; hard copy $9.10 
from EDRS/NCR) Includes ERIC/CRIER document 
numbers 4804-5345 and 6253-6562. 

3] Recent Doctoral Dissertation Research in Reading, 
(ED 012 693, microfiche $2.00; hard copy $11.05 from 

EDRS/NCR) Includes ERIC/CRIER document numbers 
5348-5727. 

4] International Reading Association Conference Pro- 
ceedings Reports on Secondary Reading, (ED 013 185, 
microfiche $2.25; hard copy $30.70 from EDRS/NCR) 
Includes ERIC/CRIER document numbers 5728-5907. 





5] International Reading Association Conference Pro- 
ceedings Reports on Elementary Reading, (ED 013 197, 
microfiche $4.25; hard copy $56.85 from EDRS/NCR) 
Includes ERIC/CRIER document numbers 5908-6252. 

6] USOE Sponsored Research on Reading, (ED 016 
603, microfiche $0.50; hard copy $5.30 from 
EDRS/NCR) Includes ERIC/CRIER document num- 
bers 6563-6706. 

How to locate a document 

A reader can locate a specific document in the 
ERIC/CRIER six Basic References quite easily. The first 
step is to identify the document number which appears as a four 
digit number in the right column of the Index to ERIC/CRIER 
reading research literature. The second step involves determin- 
ing in which of the six Basic References the document number 
is included (this listing of the six basic documents includes the 
numbers). The documents are listed within each of the Basic 
References in chronological order with the lowest number ap- 
pearing in the beginning of the reference and the highest num- 
bers at the end. The reader, once he finds the entry, is supplied 
with the full citation of the work and an annotation. He can 
then go to the library and look up the complete article. 

Perhaps an example would be useful to demonstrate how to 
use this reference. The first listing in the Index is for the Botel 
Reading Inventory. The number to appear in the right hand 
column is 4986. 4986 appears in Published Research Litera- 

ture in Reading, 1964-1966 (since it falls between 4804 and 
5345). The full citation given is : 

4986 Santostefano, Sebastiano, Rutledge, Louis, and Randall, David. 
‘Cognitive Styles and Reading Disability,’ Psychology in the Schools, 2 
(Jan. 1965), 57-62. 

Describes a study in which three tests were devised and 
used in three separate, but interdependent, experiments. 






Purpose was to explore whether the cognitive functioning 
of children with reading disability could be differentiated in 
terms of three cognitive styles— 1) focusmg-scanning, ) 
leveling-sharpening, and 3) constructed-flex.ble. Experi- 
mental group was 24 retarded readers with mean age 
of 10.94 and mean I.Q. of 92.71. Control group was 23 
nonretarded readers with mean age of 9.91 and mean LQ. 
of 98.39. All subjects were boys and selected from gra es 

three through six. 



The user can then locate Psychology in the Schools in the li- 
brary. 



How to order ERIC/CRIER Basic References 



Each of the ERIC/CRIER Basic References can be ordered 

ERIC Document Reproduction Service (EDRS) 

The National Cash Register Company 
4936 Fairmont Avenue 

Bethesda, Maryland 20014 p 

To order any of these documents, the following information 

must be furnished : 



1] The accession number (ED number) of the desired 
document. 



2] The type of reproduction desired — microfiche or hard 
copy. 

3] The number of copies being ordered. 

4] The method of payment — cash with order, deposit 
account, charge. 

a. Add a special handling charge of 5(ty to all orders. 

b. Add applicable state sales taxes or submit tax 
exemption certificates. 

c. Add a 25% service charge on all orders from 
outside the United States, its territories and posses 



sions. 
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d. Payment must accompany orders totaling less 
than $5.00. Do not send stamps. 

e. $20.00 prepaid EDRS coupons are available upon 
request from EDRS. 

EDRS will provide information on charges and deposit accounts 
upon request. 



Test 

Botel Reading Inventory 

California Reading Test 

Lower Primary 
Upper Primary 
Elementary 
Junior High Level 
Advanced 




Citation number in 
ERIC/CRIER 
basic references 

4986 

3179, 3248, 3251, 3258, 
3337, 3345, 3493, 3495, 
3499, 3502, 3516, 3533, 
3544, 3592, 3631, 3646, 
3665, 3671, 3683, 3693, 
3738, 3805, 3821, 3851, 
3861, 3862, 3878, 3918, 
3919, 3936, 3956, 3968, 
3979, 3994, 4023, 4024, 
4028, 4073, 4113, 4114, 
4119, 4126, 4156, 4176, 
4211, 4212, 4245, 4247, 
4250, 4272, 4319, 4374, 
4380, 4385, 4389, 4402, 
4433, 4436, 4441, 4478, 
4481, 4498, 4533, 4571, 
4585, 4587, 4605, 4609, 
4632, 4639, 4645, 4668, 
4700, 4706, 4713, 4714, 
4715, 4722, 4751, 4776, 
4777, 4801, 4825, 4826, 
4835, 4836, 4842, 4844, 
4862, 4901, 4904, 4912, 
4919, 4920, 4948, 4949, 
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Test 

California Reading Test (cont'd) 




Commerce Reading Comprehension Test 

Cooperative English Tests: Reading 
Comprehension 




Citation number in 
ERIC/CRIER 
basic references 

4958 , 4966 , 4994 , 4999 , 
5056 , 5061 , 5085 , 5089 , 
5097 , 5113 , 5120 , 5127 , 
5143 , 5161 , 5162 , 5171 , 
5172 , 5187 , 5193 , 5198 , 
5214 , 5221 , 5235 , 5236 , 
5244 , 5303 , 5345 , 6261 , 
6300 , 5929 , 5936 , 6018 , 
6030 , 6038 , 6066 , 6069 , 
6141 , 6157 , 6238 , 6242 , 
5738 , 5373 , 5377 , 5383 , 
5395 , 5415 , 5423 , 5448 , 
5457 , 5469 , 5482 , 5490 , 
5491 , 5499 , 5506 , 5508 , 
5514 , 5515 , 5520 , 5539 , 
5545 , 5558 , 5597 , 5608 , 
5611 , 5644 , 5647 , 5658 , 
5665 , 5670 , 5682 , 5683 , 
5693 , 5713 , 5716 , 5720 , 
5721 , 6563 , 6566 , 6570 , 
6588 , 6589 , 6590 , 6608 , 
6622 , 6632 , 6633 , 6674 

3952 



2885 , 2945 , 3027 , 3065 , 
3066 , 3067 , 3072 , 3116 , 
3117 , 3158 , 3159 , 3172 , 
3198 , 3213 , 3232 , 3242 , 
3251 , 3292 , 3347 , 3355 , 
3366 , 3371 , 3372 , 3377 , 
3389 , 3422 , 3426 , 3440 , 








\ r. 






Cooperative English Tests: Reading 3452, 3483, 3487, 3493, 

Comprehension (cont’d) 3502, 3512, 3514, 3540, 

3556, 3596, 3628, 3639, 
3686, 3695, 3707, 3723, 
3763, 3772, 3801, 3812, 
3847, 3856, 3864, 3865, 
3880, 3928, 3980, 3987, 
4006, 4020, 4032, 4040, 
4046, 4150, 4186, 4226, 
4285, 4298, 4377, 4450, 
4460, 4469, 4476, 4491, 
4522, 4527, 4614, 4639, 
4653, 4675, 4686, 4694, 
4727, 4762, 4800, 4916, 
4987, 5002, 5093, 5157, 
5404, 5436, 5461, 5478, 
5498, 5502, 5588, 5591, 
5663, 5762, 5768, 5827 



Davis Reading Test 4614, 5164, 5354 

Series I 
Series II 



Development Reading Tests 

Primer Reading 
Lower Primary Reading 
Upper Primary Reading 
Intermediate Reading 



4725, 4776, 5077, 5142, 
6251, 5426, 5472, 5486, 
5538, 5628 



Developmental Reading Tests: 
Silent Reading Diagnostic Tests 

Diagnostic Reading Scales 



3974, 4247, 4317, 4776, 
5077, 5319, 5426, 5564 

4614, 5467 



Diagnostic Reading Test 
Lower Level 
Upper Level 



3117, 3172, 3182, 3251, 
2887, 2960, 2970, 3066, 
3088, 3144, 3173, 3283, 
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Test 



Citation number in 
ERIC/CRIER 
basic references 



Diagnostic Reading Test (cont’d) 



Doren Diagnostic Reading Test of 
Word Recognition Skills 

Durreii Analysis of Reading Difficulty 




3284, 3297, 3308, 3324, 
3344, 3345, 3367, 3384, 
3385, 3394, 3402, 3416, 
3440, 3483, 3498, 3502, 
3514, 3522, 3528, 3530, 
3539, 3545, 3562, 3584, 
3608, 3620, 3634, 3654, 
3659, 3676, 3679, 3719, 
3758, 3772, 3792, 3797, 
3801, 3812, 3849, 3876, 
3902, 3913, 3962, 3967, 
3987, 3989, 3991, 4021, 
4152, 4166, 4174, 4189, 
4190, 4191, 4325, 4426, 
4467, 4476, 4479, 4522, 
4527, 4562, 4586, 4589, 
4639, 4641, 4675, 4703, 
4728, 4755, 4766, 4777, 
4874, 4910, 4968, 4978, 
5021, 5088, 5250, 5369, 
5414, 5429, 5442, 5465, 
5567, 5591, 5659, 5692, 
5697, 6576, 6677, 4614 

5479 



2892, 2915, 2955, 2959, 
3048, 3151, 3226, 3691, 
4091, 4156, 4213, 4325, 
4433, 4614, 4672, 4674, 
4776, 4777, 4842, 5118, 
5236, 5292, 5992, 6224, 
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Durrell Analysis of Reading Difficulty (cont’d) 

Gates-McKillop Reading Diagnostic Tests 
Gilmore Oral Reading Test 



Gray Oral Reading Test 



Gray-Votaw-Rogers General 
Achievement Test 
Level I 
Level II 

Harrison-Stroud Reading Readiness Profiles 



Iowa Every-Pupil Tests of Basic Skills, 
Test B 

Elementary Battery 



5496 , 5585 , 5590 , 5613 , 
5725 , 6564 

4834 , 5495 , 6241 , 6675 

3514 , 4383 , 4468 , 4777 , 
4873 , 5085 , 5121 , 5140 , 
5144 , 5255 , 5282 , 5297 , 
5339 , 6300 , 6066 , 6127 , 
6133 , 5387 , 5393 , 5404 , 
5457 , 5469 , 5490 , 5536 , 
5604 , 5623 , 5649 , 5678 , 
5693 , 6597 , 6599 , 6600 , 
6603 , 6608 , 6611 , 6614 , 
6616 , 6621 , 6634 , 6635 , 
6636 , 6638 , 6639 , 6643 , 
6648 , 6679 

2892 , 2886 , 2888 , 2902 , 
2985 , 3002 , 3024 , 3037 , 
3251 , 3315 , 3613 , 3660 , 
3815 , 3870 , 3941 , 4261 , 
4313 , 4563 , 4639 , 4777 , 
4842 , 5076 , 5441 , 5454 , 
5466 , 5491 , 5495 , 5542 , 
5604 , 6563 , 6649 , 6682 

3699 , 3818 , 4776 , 4991 , 
6619 



3564 , 4776 , 4778 , 5013 , 
5021 , 5107 , 5424 , 5427 , 
5433 , 5519 , 5525 , 5535 , 
5565 , 5611 



2904 , 2913 , 3248 , 3322 , 
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Iowa Silent Reading Tests (cont’d) 
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Iowa Tests of Educational Development 

Test 5: Ability to Interpret Reading 
Materials in the Social Studies 



Test 6: Ability to Interpret Reading 
Materials in the Natural Science''' 

Test 9: Use of Sources of Information 



4424, 4526, 4533, 4614, 
4639, 4677, 4716, 4777, 
5078, 5781, 5358, 5376, 
5425, 5474, 5488, 5505, 
5528, 5533, 5549, 5556, 
5562, 5582, 5650, 5694, 
5712 

3175, 3387, 3418, 3752, 
3801, 4040, 4051, 4096, 
4653, 4762, 4879, 5370, 
5461, 6565 



Kelley-Greene Reading Comprehension Test 3698, 3712 



Lee-Clark Reading Readiness Test 



McCullough Word Analysis Tests 

Metropolitan Achievement Tests: Reading 

Upper Primary Reading Test 
Elementary Reading Test 
Intermediate Reading Test 
Advanced Reading Test 



2900, 2918, 3038, 3040, 
4124, 4126, 4258, 4451, 
4455, 4502, 4605, 4624, 
4776, 4778, 4897, 4949, 
5113, 5135, 5925, 6103, 
6117, 6124, 6224, 5497, 
5506, 5519, 5709, 6589, 
6599, 6600, 6608 

4614 

2900, 2931, 3033, 3101, 
3126, 3226, 3251, 3304, 
3335, 3468, 3486, 3506, 
3519, 3525, 3531, 3592, 
3604, 3642, 3680, 3808, 
3815, 4168, 4178, 4355, 
4441, 4552, 4584, 4602, 
4639, 4697, 4751, 4776, 
4778, 4813, 4814, 4849, 
4882, 4957, 4982, 5066, 
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Test 

Metropolitan Achievement Tests: 
Reading (cont’d) 



Metropolitan Readiness Tests 




Citation number in 
ERIC/CRIER 
basic references 

5076, 5124, 5130, 5137, 
5166, 5168, 5198, 5206, 
6282, 6296, 5911, b996, 
6016, 6124, 5898, 5352, 
5361, 5371, 5383, 5393, 
5401, 5411, 5424, 5456, 
5460, 5481, 5501, 5510, 
5521, 5542, 5580, 5609, 
5619, 5643, 5658, 5680, 
5713, 6564, 6591, 6609, 
6639, 6658, 6668, 6676, 
6695 

2900, 3033, 3314, 3437, 
3459, 3564, 3633, 3642, 
3680, 3699, 3745, 3811, 
3819, 4185, 4194, 4253, 
4430, 4451, 4624, 4639, 
4679, 4697, 4776, 4777, 
4778, 4803, 4870, 5066, 
5112, 5121, 5135, 5140, 
5144, 5149, 5150, 5154, 
5172, 5195, 5198, 5199, 
5200, 5206, 5208, 5255, 
5269, 5282, 5293, 5335, 
5343, 6303, 6112, 6127, 
6133, 5349, 5352, 5408, 
5450, 5459, 5481, 5532, 
5548, 5557, 5587, 5595, 
5628, 5634, 5647, 5648, 
5655, 5680, 5721, 6571, 
6595, 6597, 6598, 6599, 
6600, 6603, 6609, 6610, 
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Metropolitan Readiness Tests (cont’d) 



6611, 6612, 6614, 6616, 
6624, 6634, 6635, 6638, 
6641, 6643, 6648, 6651, 
6666, 6668, 6679, 6695 

Minnesota Reading Examinations for 3698 

College Students 

Minnesota Speed of Reading Test for 3127, 3321, 3600, 3705 

College Students 



Monroe Reading Aptitude Tests 
Monroe's Standardized Silent Reading Test 
Murphy-Durrell Reading Readiness Analysis 



Neale Analysis of Reading Ability 

Nelson-Denny Reading Test: 
Vocabulary-Comprehension-Rate 



Nelson Reading Test 
Pressey Diagnostic Reading Tests 



3488, 4483, 4803, 5107, 
6116, 5433, 5443, 5557 

3338, 5454, 3519, 3613, 
4551, 5496 

4776, 4778, 5107, 5121, 
5144, 5149, 5199, 5208, 
5224, 5255, 5282, 5293, 
5297, 5343, 5525, 6127, 
6133, 6282, 6595, 6597, 
6599, 6600, 6603, 6610, 
6611, 6612, 6614, 6616, 
6624, 6634, 6638, 6643, 
6648, 6668, 6679 

4607, 4956, 5106, 5994 

2930, 2946, 3087, 3154, 
3440, 3483, 3502, 3524, 
3677, 3698, 4182, 4368, 
4812, 4878, 4895, 4959, 
4961, 4970, 4979, 5095, 
5184, 5252, 5342, 6253, 
6297, 5752, 5405, 5546, 
5617, 5630 

3115, 3251, 3383, 3421, 
3667, 3825 

3848, 4777 





Test 

Primary Reading Profiles 
Robinson-Hall Reading Tests 
SRA Achievement Series 

SRA Reading Record 

Stanford Achievement Test: 
Reading Tests 

Primary 1 
Primary 2 
Intermediate I 
Intermediate II 
Advanced 



Citation number in 
ERIC/CRIER 
basic references 

4151, 4323, 4776 

3470 

4414, 4639, 4777, 5138, 
5166, 5171, 5427, 5480, 
5581, 5696, 6618, 6677 

3092, 3251, 3483, 4614, 
5090 

2904, 2943, 2953, 2967, 
2993, 3013, 3064, 3138, 
3181, 3196, 3226, 3351, 
3366, 3374, 3462, 3463, 
3464, 3514, 3531, 3652, 
3660, 3661, 3662, 3705, 
3791, 3814, 3850, 3870, 
3880, 3927, 3974, 3997, 
3999, 4090, 4125, 4143, 
4192, 4210, 4220, 4254, 
4255, 4280, 4293, 4332, 
4355, 4385, 4412, 4430, 
4436, 4441, 4493, 4494, 
4535, 4572, 4639, 4646, 
4670, 4687, 4725, 4726, 
4740, 4776, 4777, 4778, 
4798, 4799, 4808, 4827, 
4844, 4846, 4882, 4897, 
4931, 4951, 4981, 5021, 
5063, 5114, 5121, 5137, 
5140, 5144, 5149, 5150, 
5158, 5197, 5200, 5202, 
5213, 5219, 5224, 5232, 
5234, 5255, 5275, 5293, 







Stanford Achievement Test: 
Reading Tests (cont’d) 



Tinker Speed of Reading Test 

Traxler High School Reading Test 

Traxler Silent Reading Test 

Van Wagenen Reading Readiness Scales 

Wide Range Achievement Test (WRAT) 

Level I 
Level II 



Williams Primary Reading Test 



5297 , 5313 , 5315 , 5328 , 
6276 , 6282 , 6300 , 5917 , 
5936 , 5968 , 6018 , 6066 , 
6073 , 6127 , 6133 , 6178 , 
5836 , 5353 , 5365 , 5366 , 
5370 , 5375 , 5383 , 5418 , 
5432 , 5438 , 5442 , 5463 , 
5504 , 5559 , 5580 , 5584 , 
5609 , 5632 , 5636 , 5640 , 
5667 , 5668 , 5673 , 5711 , 
5714 , 6564 , 6571 , 6576 , 
6586 , 6587 , 6591 , 6593 , 
6595 , 6597 , 6598 , 6599 , 
6600 , 6603 , 6608 , 6610 , 
6611 , 6613 , 6614 , 6616 , 
6624 , 6629 , 6634 , 6635 , 
6638 , 6639 , 6643 , 6648 , 
6658 , 6668 , 6674 , 6678 , 
6679 

3281 , 3282 , 3364 , 3576 , 
3908 , 3909 

3251 , 3452 , 3483 , 4879 

3125 , 3251 , 3713 , 5021 

3441 , 3488 , 4776 , 4778 

3315 , 3338 , 3490 , 3844 , 
4008 , 4160 , 4261 , 4435 , 
4443 , 4621 , 4639 , 4776 , 
4777 , 4873 , 5073 , 5094 , 
5111 , 5345 , 6294 , 6115 , 
6223 , 5652 , 6620 

4870 



Primary II 





The Nineteen Clearinghouses in the ERIC System 



Adult Education 

Syracuse University 
107 Roney Lane 
Syracuse, N.Y. 13210 

Counseling and Personnel Services 

Services Information Center 

611 Church Street 

Ann Arbor, Michigan 48104 

Disadvantaged 

Teachers College 
Columbia University 
New York, New York 10027 

Early Childhood Education 

University of Illinois 

805 West Pennsylvania Avenue 

Urbana, Illinois 61801 

Educational Administration 

University of Oregon 
Eugene, Oregon 97403 

Educational Facilities 

University of Wisconsin 
606 State St. 

Madison, Wisconsin 53703 

Educational Media & Technology 

Institute for Communication Re- 
search 

Stanford University 
Palo Alto, California 94305 

Exceptional Children 

The Council for Exceptional Chil- 
dren 

1499 Jefferson Davis Highway 
Arlington, Virginia 22202 

Junior Colleges 

University of California at Los 
Angeles 

405 Hilgard Avenue 

Los Angeles, California 90024 



Higher Education 

George Washington University 
Washington, D.C. 20006 

Library and Information Sciences 

University of Minnesota 
2122 Riverside Avenue 
Minneapolis, Minnesota 55404 

Linguistics 

Center for Applied Linguistics 
1717 Massachusetts Ave., N.W. 
Washington, D.C. 20036 

Reading 

Indiana University 
200 Pine Hall 

Bloomington, Indiana 47401 

Rural Education & Small Schools 

Box AP, University Park Branch 
New Mexico State University 
Las Cruces, New Mexico 88001 

Science Education 

Ohio State University 
1460 West Lane Avenue 
Columbus, Ohio 43221 

Teacher Education 

1156 Fifteenth St., N.W. 
Washington, D.C. 20005 

Teaching of English 

National Council of Teachers of 
English 

508 South Sixth Street 
Champaign, Illinois 61820 

Teaching of Foreign Languages 

Modern Language Association of 

America 

62 Fifth Avenue 

New York, New York 10011 

Vocational & Technical Education 

Ohio State University 
1900 Kenny Road 
Columbus, Ohio 43212 



